Accurate and Efficient Cell Lineage Tree Inference from Noisy Single Cell Data: the Maximum Likelihood Perfect Phylogeny Approach

Mapping Intimacies ◽

10.1101/742395 ◽

2019 ◽

Author(s):

Yufeng Wu

Keyword(s):

Single Cell ◽

Cell Lineage ◽

Large Data ◽

Genomic Variation ◽

Perfect Phylogeny ◽

Tree Inference ◽

Lineage Tree ◽

Infinite Sites Model ◽

New Applications ◽

Cell Data

AbstractCells in an organism share a common evolutionary history, called cell lineage tree. Cell lineage tree can be inferred from single cell genotypes at genomic variation sites. Cell lineage tree inference from noisy single cell data is a challenging computational problem. Most existing methods for cell lineage tree inference assume uniform uncertainty in genotypes. A key missing aspect is that real single cell data usually has non-uniform uncertainty in individual genotypes. Moreover, existing methods are often sampling-based and can be very slow for large data.In this paper, we propose a new method called ScisTree, which infers cell lineage tree and calls genotypes from noisy single cell genotype data. Different from most existing approaches, ScisTree works with genotype probabilities of individual genotypes (which can be computed by existing single cell genotype callers). ScisTree assumes the infinite sites model. Given uncertain genotypes with individualized probabilities, ScisTree implements a fast heuristic for inferring cell lineage tree and calling the genotypes that allow the so-called perfect phylogeny and maximize the likelihood of the genotypes. Through simulation, we show that ScisTree performs well on the accuracy of inferred trees, and is much more efficient than existing methods. The efficiency of ScisTree enables new applications including imputation of the so-called doublets.AvailabilityThe program ScisTree is available for download at: https://github.com/yufengwudcs/[email protected]

Download Full-text

Accurate and efficient cell lineage tree inference from noisy single cell data: the maximum likelihood perfect phylogeny approach

Bioinformatics ◽

10.1093/bioinformatics/btz676 ◽

2019 ◽

Author(s):

Yufeng Wu

Keyword(s):

Single Cell ◽

Cell Lineage ◽

Large Data ◽

Genomic Variation ◽

Supplementary Information ◽

Perfect Phylogeny ◽

Tree Inference ◽

Lineage Tree ◽

Infinite Sites Model ◽

Cell Data

Abstract Motivation Cells in an organism share a common evolutionary history, called cell lineage tree. Cell lineage tree can be inferred from single cell genotypes at genomic variation sites. Cell lineage tree inference from noisy single cell data is a challenging computational problem. Most existing methods for cell lineage tree inference assume uniform uncertainty in genotypes. A key missing aspect is that real single cell data usually has non-uniform uncertainty in individual genotypes. Moreover, existing methods are often sampling based and can be very slow for large data. Results In this article, we propose a new method called ScisTree, which infers cell lineage tree and calls genotypes from noisy single cell genotype data. Different from most existing approaches, ScisTree works with genotype probabilities of individual genotypes (which can be computed by existing single cell genotype callers). ScisTree assumes the infinite sites model. Given uncertain genotypes with individualized probabilities, ScisTree implements a fast heuristic for inferring cell lineage tree and calling the genotypes that allow the so-called perfect phylogeny and maximize the likelihood of the genotypes. Through simulation, we show that ScisTree performs well on the accuracy of inferred trees, and is much more efficient than existing methods. The efficiency of ScisTree enables new applications including imputation of the so-called doublets. Availability and implementation The program ScisTree is available for download at: https://github.com/yufengwudcs/ScisTree. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Scelestial: fast and accurate single-cell lineage tree inference based on a Steiner tree approximation algorithm

10.1101/2021.05.24.445405 ◽

2021 ◽

Author(s):

Mohammad-Hadi Foroughmand-Araabi ◽

Sama Goliaei ◽

Alice Carolyn McHardy

Keyword(s):

Approximation Algorithm ◽

Single Cell ◽

Steiner Tree ◽

Missing Values ◽

Cell Lineage ◽

Error Rates ◽

Steiner Tree Problem ◽

Tree Reconstruction ◽

Tree Inference ◽

Lineage Tree

Single-cell genome sequencing provides a highly granular view of biological systems but is affected by high error rates, allelic amplification bias, and uneven genome coverage. This creates a need for data-specific computational methods, for purposes such as for cell lineage tree inference. The objective of cell lineage tree reconstruction is to infer the evolutionary process that generated a set of observed cell genomes. Lineage trees may enable a better understanding of tumor formation and growth, as well as of organ development for healthy body cells. We describe a method, Scelestial, for lineage tree reconstruction from single-cell data, which is based on an approximation algorithm for the Steiner tree problem and is a generalization of the neighbor-joining method. We adapt the algorithm to efficiently select a limited subset of potential sequences as internal nodes, in the presence of missing values, and to minimize cost by lineage tree-based missing value imputation. In a comparison against seven state-of-the-art single-cell lineage tree reconstruction algorithms - BitPhylogeny, OncoNEM, SCITE, SiFit, SASC, SCIPhI, and SiCloneFit - on simulated and real single-cell tumor samples, Scelestial performed best at reconstructing trees in terms of accuracy and run time. Scelestial has been implemented in C++. It is also available as an R package named RScelestial.

Download Full-text

Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo

Science ◽

10.1126/science.aar4362 ◽

2018 ◽

Vol 360 (6392) ◽

pp. 981-987 ◽

Cited By ~ 278

Author(s):

Daniel E. Wagner ◽

Caleb Weinreb ◽

Zach M. Collins ◽

James A. Briggs ◽

Sean G. Megason ◽

...

Keyword(s):

Single Cell ◽

Zebrafish Embryo ◽

Cell Lineage ◽

Vertebrate Development ◽

Cell Mapping ◽

Cell Fates ◽

Web Based ◽

A Cell ◽

Cell Data ◽

Germ Layer Formation

High-throughput mapping of cellular differentiation hierarchies from single-cell data promises to empower systematic interrogations of vertebrate development and disease. Here we applied single-cell RNA sequencing to >92,000 cells from zebrafish embryos during the first day of development. Using a graph-based approach, we mapped a cell-state landscape that describes axis patterning, germ layer formation, and organogenesis. We tested how clonally related cells traverse this landscape by developing a transposon-based barcoding approach (TracerSeq) for reconstructing single-cell lineage histories. Clonally related cells were often restricted by the state landscape, including a case in which two independent lineages converge on similar fates. Cell fates remained restricted to this landscape in embryos lacking the chordin gene. We provide web-based resources for further analysis of the single-cell data.

Download Full-text

Robust Lineage Reconstruction from High-Dimensional Single-Cell Data

10.1101/036533 ◽

2016 ◽

Author(s):

Gregory Giecold ◽

Eugenio Marco ◽

Lorenzo Trippa ◽

Guo-Cheng Yuan

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Quantitative Estimate ◽

Cell Lineage ◽

Computational Method ◽

Expression Data ◽

Cell Gene Expression ◽

Cell Data ◽

Cell Gene

Single-cell gene expression data provide invaluable resources for systematic characterization of cellular hierarchy in multi-cellular organisms. However, cell lineage reconstruction is still often associated with significant uncertainty due to technological constraints. Such uncertainties have not been taken into account in current methods. We present ECLAIR, a novel computational method for the statistical inference of cell lineage relationships from single-cell gene expression data. ECLAIR uses an ensemble approach to improve the robustness of lineage predictions, and provides a quantitative estimate of the uncertainty of lineage branchings. We show that the application of ECLAIR to published datasets successfully reconstructs known lineage relationships and significantly improves the robustness of predictions. In conclusion, ECLAIR is a powerful bioinformatics tool for single-cell data analysis. It can be used for robust lineage reconstruction with quantitative estimate of prediction accuracy.

Download Full-text

Decision tree models and cell fate choice

10.1101/2020.12.19.423629 ◽

2020 ◽

Author(s):

Ivan Croydon Veleslavov ◽

Michael P.H. Stumpf

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Fate ◽

Cell Types ◽

Lineage Tree ◽

Tree Models ◽

Fate Decision ◽

Average Gene ◽

Lineage Trees ◽

Cell Data

AbstractSingle cell transcriptomics has laid bare the heterogeneity of apparently identical cells at the level of gene expression. For many cell-types we now know that there is variability in the abundance of many transcripts, and that average transcript abun-dance or average gene expression can be a unhelpful concept. A range of clustering and other classification methods have been proposed which use the signal in single cell data to classify, that is assign cell types, to cells based on their transcriptomic states. In many cases, however, we would like to have not just a classifier, but also a set of interpretable rules by which this classification occurs. Here we develop and demonstrate the interpretive power of one such approach, which sets out to establish a biologically interpretable classification scheme. In particular we are interested in capturing the chain of regulatory events that drive cell-fate decision making across a lineage tree or lineage sequence. We find that suitably defined decision trees can help to resolve gene regulatory programs involved in shaping lineage trees. Our approach combines predictive power with interpretabilty and can extract logical rules from single cell data.

Download Full-text

Conifer: clonal tree inference for tumor heterogeneity with single-cell and bulk sequencing data

BMC Bioinformatics ◽

10.1186/s12859-021-04338-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Leila Baghaarabani ◽

Sama Goliaei ◽

Mohammad-Hadi Foroughmand-Araabi ◽

Seyed Peyman Shariatpanahi ◽

Bahram Goliaei

Keyword(s):

Single Cell ◽

Tumor Heterogeneity ◽

Temporal Order ◽

Variant Allele ◽

Evolutionary Relationships ◽

Sequencing Data ◽

Variant Allele Frequency ◽

Single Cell Sequencing ◽

Tree Inference ◽

Cell Data

Abstract Background Genetic heterogeneity of a cancer tumor that develops during clonal evolution is one of the reasons for cancer treatment failure, by increasing the chance of drug resistance. Clones are cell populations with different genotypes, resulting from differences in somatic mutations that occur and accumulate during cancer development. An appropriate approach for identifying clones is determining the variant allele frequency of mutations that occurred in the tumor. Although bulk sequencing data can be used to provide that information, the frequencies are not informative enough for identifying different clones with the same prevalence and their evolutionary relationships. On the other hand, single-cell sequencing data provides valuable information about branching events in the evolution of a cancerous tumor. However, the temporal order of mutations may be determined with ambiguities using only single-cell data, while variant allele frequencies from bulk sequencing data can provide beneficial information for inferring the temporal order of mutations with fewer ambiguities. Result In this study, a new method called Conifer (ClONal tree Inference For hEterogeneity of tumoR) is proposed which combines aggregated variant allele frequency from bulk sequencing data with branching event information from single-cell sequencing data to more accurately identify clones and their evolutionary relationships. It is proven that the accuracy of clone identification and clonal tree inference is increased by using Conifer compared to other existing methods on various sets of simulated data. In addition, it is discussed that the evolutionary tree provided by Conifer on real cancer data sets is highly consistent with information in both bulk and single-cell data. Conclusions In this study, we have provided an accurate and robust method to identify clones of tumor heterogeneity and their evolutionary history by combining single-cell and bulk sequencing data.

Download Full-text

Single-cell Lineage Tracing by Integrating CRISPR-Cas9 Mutations with Transcriptomic Data

10.1101/630814 ◽

2019 ◽

Cited By ~ 4

Author(s):

Hamim Zafar ◽

Chieh Lin ◽

Ziv Bar-Joseph

Keyword(s):

Single Cell ◽

Cell Lineage ◽

Lineage Tracing ◽

Expression Data ◽

Random Mutation ◽

Novel Technologies ◽

Gene Sets ◽

Lineage Tree ◽

Novel Method ◽

Differentiation Pathways

AbstractRecent studies combine two novel technologies, single-cell RNA-sequencing and CRISPR-Cas9 barcode editing for elucidating developmental lineages at the whole organism level. While these studies provided several insights, they face several computational challenges. First, lineages are reconstructed based on noisy and often saturated random mutation data. Additionally, due to the randomness of the mutations, lineages from multiple experiments cannot be combined to reconstruct a consensus lineage tree. To address these issues we developed a novel method, LinTIMaT, which reconstructs cell lineages using a maximum-likelihood framework by integrating mutation and expression data. Our analysis shows that expression data helps resolve the ambiguities arising in when lineages are inferred based on mutations alone, while also enabling the integration of different individual lineages for the reconstruction of a consensus lineage tree. LinTIMaT lineages have better cell type coherence, improve the functional significance of gene sets and provide new insights on progenitors and differentiation pathways.

Download Full-text

Identifying Informative Gene Modules Across Modalities of Single Cell Genomics

10.1101/2020.02.06.937805 ◽

2020 ◽

Cited By ~ 2

Author(s):

David DeTomaso ◽

Nir Yosef

Keyword(s):

Single Cell ◽

Cell Lineage ◽

Rna Seq ◽

Informative Gene ◽

T Helper ◽

Multimodal Data ◽

Transcriptional Variation ◽

Lineage Tree ◽

A Cell ◽

Gene Modules

AbstractTwo fundamental aims that emerge when analyzing single-cell RNA-seq data are that of identifying which genes vary in an informative manner and determining how these genes organize into modules. Here we propose a general approach to these problems that operates directly on a given metric of cell-cell similarity, allowing for its integration with any method (linear or non linear) for identifying the primary axes of transcriptional variation between cells. Additionally, we show that when using multimodal data, our procedure can be used to identify genes whose expression reflects alternative notions of similarity between cells, such as physical proximity in a tissue or clonal relatedness in a cell lineage tree. In this manner, we demonstrate that while our method, called Hotspot, is capable of identifying genes that reflect nuanced transcriptional variability between T helper cells, it can also identify spatially-dependent patterns of gene expression in the cerebellum as well as developmentally-heritable expression signatures during embryogenesis.

Download Full-text

SCuPhr: A Probabilistic Framework for Cell Lineage Tree Reconstruction

10.1101/357442 ◽

2018 ◽

Cited By ~ 1

Author(s):

Hazal Koptagel ◽

Seong-Hwan Jun ◽

Jens Lagergren

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Copy Number ◽

Cell Lineage ◽

Lineage Tracing ◽

The Other ◽

Sequencing Data ◽

Data Set ◽

Detailed Model ◽

Lineage Tree

AbstractReconstruction of cell lineage trees from single-cell DNA sequencing data, has the potential to become a fundamental tool in study of development of disease, in particular cancer. For cells without copy number alterations that has not been exposed to specific marking techniques, that is normal cells, lineage tracing is naturally based on somatic point mutations. Current single cell sequencing techniques applicable to such cells require an amplification step, which introduces errors, and still often suffer from so-called allelic dropout. We present a detailed model of current technologies for the purpose of estimating the distance between cells without copy number changes, based on single-cell DNA sequencing data. The model is well suited for full Bayesian analysis by introducing prior probabilities for key parameters as well as maximum a posteriori estimation using expectation maximization algorithm. Our model outputs distance between two cells, simultaneously taking all the other cells into account. In particular, the model contains variables associated with pairs of loci, of which one is homozygous and the other heterozygous, and has the capacity to perform Bayesian probabilistic read phasing. By applying a fast distance based method, such as FNJ, to the estimated distance, a cell lineage tree can be obtained. In contrast to MCMC based methods, FNJ can easily handle data sets with tens of thousands of taxa. The high accuracy of the so obtained method, called SCuPhr, is shown in studies of several synthetic data set.

Download Full-text

Analysis of cell size homeostasis at the single-cell and population level

10.1101/338632 ◽

2018 ◽

Author(s):

Philipp Thomas

Keyword(s):

Single Cell ◽

Cell Size ◽

Single Cells ◽

Population Level ◽

Power Laws ◽

Size Distributions ◽

Unified Framework ◽

Lineage Tree ◽

Cell To Cell Variability ◽

Cell Data

Growth pervades all areas of life from single cells to cell populations to tissues. However, cell size often fluctuates significantly from cell to cell and from generation to generation. Here we present a unified framework to predict the statistics of cell size variations within a lineage tree of a proliferating population. We analytically characterise (i) the distributions of cell size snapshots, (ii) the distribution within a population tree, and (iii) the distribution of lineages across the tree. Surprisingly, these size distributions differ significantly from observing single cells in isolation. In populations, cells seemingly grow to different sizes, typically exhibit less cell-to-cell variability and often display qualitatively different sensitivities to cell cycle noise and division errors. We demonstrate the key findings using recent single-cell data and elaborate on the implications for the ability of cells to maintain a narrow size distribution and the emergence of different power laws in these distributions.

Download Full-text