scholarly journals Accurate and Efficient Cell Lineage Tree Inference from Noisy Single Cell Data: the Maximum Likelihood Perfect Phylogeny Approach

2019 ◽  
Author(s):  
Yufeng Wu

AbstractCells in an organism share a common evolutionary history, called cell lineage tree. Cell lineage tree can be inferred from single cell genotypes at genomic variation sites. Cell lineage tree inference from noisy single cell data is a challenging computational problem. Most existing methods for cell lineage tree inference assume uniform uncertainty in genotypes. A key missing aspect is that real single cell data usually has non-uniform uncertainty in individual genotypes. Moreover, existing methods are often sampling-based and can be very slow for large data.In this paper, we propose a new method called ScisTree, which infers cell lineage tree and calls genotypes from noisy single cell genotype data. Different from most existing approaches, ScisTree works with genotype probabilities of individual genotypes (which can be computed by existing single cell genotype callers). ScisTree assumes the infinite sites model. Given uncertain genotypes with individualized probabilities, ScisTree implements a fast heuristic for inferring cell lineage tree and calling the genotypes that allow the so-called perfect phylogeny and maximize the likelihood of the genotypes. Through simulation, we show that ScisTree performs well on the accuracy of inferred trees, and is much more efficient than existing methods. The efficiency of ScisTree enables new applications including imputation of the so-called doublets.AvailabilityThe program ScisTree is available for download at: https://github.com/yufengwudcs/[email protected]

2019 ◽  
Author(s):  
Yufeng Wu

Abstract Motivation Cells in an organism share a common evolutionary history, called cell lineage tree. Cell lineage tree can be inferred from single cell genotypes at genomic variation sites. Cell lineage tree inference from noisy single cell data is a challenging computational problem. Most existing methods for cell lineage tree inference assume uniform uncertainty in genotypes. A key missing aspect is that real single cell data usually has non-uniform uncertainty in individual genotypes. Moreover, existing methods are often sampling based and can be very slow for large data. Results In this article, we propose a new method called ScisTree, which infers cell lineage tree and calls genotypes from noisy single cell genotype data. Different from most existing approaches, ScisTree works with genotype probabilities of individual genotypes (which can be computed by existing single cell genotype callers). ScisTree assumes the infinite sites model. Given uncertain genotypes with individualized probabilities, ScisTree implements a fast heuristic for inferring cell lineage tree and calling the genotypes that allow the so-called perfect phylogeny and maximize the likelihood of the genotypes. Through simulation, we show that ScisTree performs well on the accuracy of inferred trees, and is much more efficient than existing methods. The efficiency of ScisTree enables new applications including imputation of the so-called doublets. Availability and implementation The program ScisTree is available for download at: https://github.com/yufengwudcs/ScisTree. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Mohammad-Hadi Foroughmand-Araabi ◽  
Sama Goliaei ◽  
Alice Carolyn McHardy

Single-cell genome sequencing provides a highly granular view of biological systems but is affected by high error rates, allelic amplification bias, and uneven genome coverage. This creates a need for data-specific computational methods, for purposes such as for cell lineage tree inference. The objective of cell lineage tree reconstruction is to infer the evolutionary process that generated a set of observed cell genomes. Lineage trees may enable a better understanding of tumor formation and growth, as well as of organ development for healthy body cells. We describe a method, Scelestial, for lineage tree reconstruction from single-cell data, which is based on an approximation algorithm for the Steiner tree problem and is a generalization of the neighbor-joining method. We adapt the algorithm to efficiently select a limited subset of potential sequences as internal nodes, in the presence of missing values, and to minimize cost by lineage tree-based missing value imputation. In a comparison against seven state-of-the-art single-cell lineage tree reconstruction algorithms - BitPhylogeny, OncoNEM, SCITE, SiFit, SASC, SCIPhI, and SiCloneFit - on simulated and real single-cell tumor samples, Scelestial performed best at reconstructing trees in terms of accuracy and run time. Scelestial has been implemented in C++. It is also available as an R package named RScelestial.


Science ◽  
2018 ◽  
Vol 360 (6392) ◽  
pp. 981-987 ◽  
Author(s):  
Daniel E. Wagner ◽  
Caleb Weinreb ◽  
Zach M. Collins ◽  
James A. Briggs ◽  
Sean G. Megason ◽  
...  

High-throughput mapping of cellular differentiation hierarchies from single-cell data promises to empower systematic interrogations of vertebrate development and disease. Here we applied single-cell RNA sequencing to >92,000 cells from zebrafish embryos during the first day of development. Using a graph-based approach, we mapped a cell-state landscape that describes axis patterning, germ layer formation, and organogenesis. We tested how clonally related cells traverse this landscape by developing a transposon-based barcoding approach (TracerSeq) for reconstructing single-cell lineage histories. Clonally related cells were often restricted by the state landscape, including a case in which two independent lineages converge on similar fates. Cell fates remained restricted to this landscape in embryos lacking the chordin gene. We provide web-based resources for further analysis of the single-cell data.


2016 ◽  
Author(s):  
Gregory Giecold ◽  
Eugenio Marco ◽  
Lorenzo Trippa ◽  
Guo-Cheng Yuan

Single-cell gene expression data provide invaluable resources for systematic characterization of cellular hierarchy in multi-cellular organisms. However, cell lineage reconstruction is still often associated with significant uncertainty due to technological constraints. Such uncertainties have not been taken into account in current methods. We present ECLAIR, a novel computational method for the statistical inference of cell lineage relationships from single-cell gene expression data. ECLAIR uses an ensemble approach to improve the robustness of lineage predictions, and provides a quantitative estimate of the uncertainty of lineage branchings. We show that the application of ECLAIR to published datasets successfully reconstructs known lineage relationships and significantly improves the robustness of predictions. In conclusion, ECLAIR is a powerful bioinformatics tool for single-cell data analysis. It can be used for robust lineage reconstruction with quantitative estimate of prediction accuracy.


2020 ◽  
Author(s):  
Ivan Croydon Veleslavov ◽  
Michael P.H. Stumpf

AbstractSingle cell transcriptomics has laid bare the heterogeneity of apparently identical cells at the level of gene expression. For many cell-types we now know that there is variability in the abundance of many transcripts, and that average transcript abun-dance or average gene expression can be a unhelpful concept. A range of clustering and other classification methods have been proposed which use the signal in single cell data to classify, that is assign cell types, to cells based on their transcriptomic states. In many cases, however, we would like to have not just a classifier, but also a set of interpretable rules by which this classification occurs. Here we develop and demonstrate the interpretive power of one such approach, which sets out to establish a biologically interpretable classification scheme. In particular we are interested in capturing the chain of regulatory events that drive cell-fate decision making across a lineage tree or lineage sequence. We find that suitably defined decision trees can help to resolve gene regulatory programs involved in shaping lineage trees. Our approach combines predictive power with interpretabilty and can extract logical rules from single cell data.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Leila Baghaarabani ◽  
Sama Goliaei ◽  
Mohammad-Hadi Foroughmand-Araabi ◽  
Seyed Peyman Shariatpanahi ◽  
Bahram Goliaei

Abstract Background Genetic heterogeneity of a cancer tumor that develops during clonal evolution is one of the reasons for cancer treatment failure, by increasing the chance of drug resistance. Clones are cell populations with different genotypes, resulting from differences in somatic mutations that occur and accumulate during cancer development. An appropriate approach for identifying clones is determining the variant allele frequency of mutations that occurred in the tumor. Although bulk sequencing data can be used to provide that information, the frequencies are not informative enough for identifying different clones with the same prevalence and their evolutionary relationships. On the other hand, single-cell sequencing data provides valuable information about branching events in the evolution of a cancerous tumor. However, the temporal order of mutations may be determined with ambiguities using only single-cell data, while variant allele frequencies from bulk sequencing data can provide beneficial information for inferring the temporal order of mutations with fewer ambiguities. Result In this study, a new method called Conifer (ClONal tree Inference For hEterogeneity of tumoR) is proposed which combines aggregated variant allele frequency from bulk sequencing data with branching event information from single-cell sequencing data to more accurately identify clones and their evolutionary relationships. It is proven that the accuracy of clone identification and clonal tree inference is increased by using Conifer compared to other existing methods on various sets of simulated data. In addition, it is discussed that the evolutionary tree provided by Conifer on real cancer data sets is highly consistent with information in both bulk and single-cell data. Conclusions In this study, we have provided an accurate and robust method to identify clones of tumor heterogeneity and their evolutionary history by combining single-cell and bulk sequencing data.


2019 ◽  
Author(s):  
Hamim Zafar ◽  
Chieh Lin ◽  
Ziv Bar-Joseph

AbstractRecent studies combine two novel technologies, single-cell RNA-sequencing and CRISPR-Cas9 barcode editing for elucidating developmental lineages at the whole organism level. While these studies provided several insights, they face several computational challenges. First, lineages are reconstructed based on noisy and often saturated random mutation data. Additionally, due to the randomness of the mutations, lineages from multiple experiments cannot be combined to reconstruct a consensus lineage tree. To address these issues we developed a novel method, LinTIMaT, which reconstructs cell lineages using a maximum-likelihood framework by integrating mutation and expression data. Our analysis shows that expression data helps resolve the ambiguities arising in when lineages are inferred based on mutations alone, while also enabling the integration of different individual lineages for the reconstruction of a consensus lineage tree. LinTIMaT lineages have better cell type coherence, improve the functional significance of gene sets and provide new insights on progenitors and differentiation pathways.


Author(s):  
David DeTomaso ◽  
Nir Yosef

AbstractTwo fundamental aims that emerge when analyzing single-cell RNA-seq data are that of identifying which genes vary in an informative manner and determining how these genes organize into modules. Here we propose a general approach to these problems that operates directly on a given metric of cell-cell similarity, allowing for its integration with any method (linear or non linear) for identifying the primary axes of transcriptional variation between cells. Additionally, we show that when using multimodal data, our procedure can be used to identify genes whose expression reflects alternative notions of similarity between cells, such as physical proximity in a tissue or clonal relatedness in a cell lineage tree. In this manner, we demonstrate that while our method, called Hotspot, is capable of identifying genes that reflect nuanced transcriptional variability between T helper cells, it can also identify spatially-dependent patterns of gene expression in the cerebellum as well as developmentally-heritable expression signatures during embryogenesis.


2018 ◽  
Author(s):  
Hazal Koptagel ◽  
Seong-Hwan Jun ◽  
Jens Lagergren

AbstractReconstruction of cell lineage trees from single-cell DNA sequencing data, has the potential to become a fundamental tool in study of development of disease, in particular cancer. For cells without copy number alterations that has not been exposed to specific marking techniques, that is normal cells, lineage tracing is naturally based on somatic point mutations. Current single cell sequencing techniques applicable to such cells require an amplification step, which introduces errors, and still often suffer from so-called allelic dropout. We present a detailed model of current technologies for the purpose of estimating the distance between cells without copy number changes, based on single-cell DNA sequencing data. The model is well suited for full Bayesian analysis by introducing prior probabilities for key parameters as well as maximum a posteriori estimation using expectation maximization algorithm. Our model outputs distance between two cells, simultaneously taking all the other cells into account. In particular, the model contains variables associated with pairs of loci, of which one is homozygous and the other heterozygous, and has the capacity to perform Bayesian probabilistic read phasing. By applying a fast distance based method, such as FNJ, to the estimated distance, a cell lineage tree can be obtained. In contrast to MCMC based methods, FNJ can easily handle data sets with tens of thousands of taxa. The high accuracy of the so obtained method, called SCuPhr, is shown in studies of several synthetic data set.


2018 ◽  
Author(s):  
Philipp Thomas

Growth pervades all areas of life from single cells to cell populations to tissues. However, cell size often fluctuates significantly from cell to cell and from generation to generation. Here we present a unified framework to predict the statistics of cell size variations within a lineage tree of a proliferating population. We analytically characterise (i) the distributions of cell size snapshots, (ii) the distribution within a population tree, and (iii) the distribution of lineages across the tree. Surprisingly, these size distributions differ significantly from observing single cells in isolation. In populations, cells seemingly grow to different sizes, typically exhibit less cell-to-cell variability and often display qualitatively different sensitivities to cell cycle noise and division errors. We demonstrate the key findings using recent single-cell data and elaborate on the implications for the ability of cells to maintain a narrow size distribution and the emergence of different power laws in these distributions.


Sign in / Sign up

Export Citation Format

Share Document