SCIΦ: Single-cell mutation identification via phylogenetic inference

Mapping Intimacies ◽

10.1101/290908 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jochen Singer ◽

Jack Kuipers ◽

Katharina Jahn ◽

Niko Beerenwinkel

Keyword(s):

Single Cell ◽

Lymphoblastic Leukemia ◽

Evolutionary Relationship ◽

Simulated Data ◽

Error Rates ◽

Cancer Therapies ◽

Sequencing Data ◽

Allelic Dropout ◽

Single Cell Sequencing ◽

Real World Datasets

AbstractUnderstanding the evolution of cancer is important for the development of appropriate cancer therapies. The task is challenging because tumors evolve as heterogeneous cell populations with an unknown number of genetically distinct subclones of varying frequencies. Conventional approaches based on bulk sequencing are limited in addressing this challenge as clones cannot be observed directly. Single-cell sequencing holds the promise of resolving the heterogeneity of tumors; however, it has its own challenges including elevated error rates, allelic dropout, and uneven coverage. Here, we develop a new approach to mutation detection in individual tumor cells by leveraging the evolutionary relationship among cells. Our method, called SCIΦ, jointly calls mutations in individual cells and estimates the tumor phylogeny among these cells. Employing a Markov Chain Monte Carlo scheme we robustly account for the various sources of noise in single-cell sequencing data. Our approach enables us to reliably call mutations in each single cell even in experiments with high dropout rates and missing data. We show that SCIΦ outperforms existing methods on simulated data and applied it to different real-world datasets, namely a whole exome breast cancer as well as a panel acute lymphoblastic leukemia dataset. Availability: https://github.com/cbg-ethz/SCIPhI

Download Full-text

A Combinatorial Approach for Single-cell Variant Detection via Phylogenetic Inference

10.1101/693960 ◽

2019 ◽

Author(s):

Mohammadamin Edrisi ◽

Hamim Zafar ◽

Luay Nakhleh

Keyword(s):

Single Cell ◽

Human Cancer ◽

Evolutionary Relationship ◽

Error Rates ◽

Intratumor Heterogeneity ◽

Combinatorial Approach ◽

Cancer Dataset ◽

Inference Problem ◽

Single Cell Sequencing ◽

Joint Inference

AbstractSingle-cell sequencing provides a powerful approach for elucidating intratumor heterogeneity by resolving cell-to-cell variability. However, it also poses additional challenges including elevated error rates, allelic dropout and non-uniform coverage. A recently introduced single-cell-specific mutation detection algorithm leverages the evolutionary relationship between cells for denoising the data. However, due to its probabilistic nature, this method does not scale well with the number of cells. Here, we develop a novel combinatorial approach for utilizing the genealogical relationship of cells in detecting mutations from noisy single-cell sequencing data. Our method, called scVILP, jointly detects mutations in individual cells and reconstructs a perfect phylogeny among these cells. We employ a novel Integer Linear Program algorithm for deterministically and efficiently solving the joint inference problem. We show that scVILP achieves similar or better accuracy but significantly better runtime over existing methods on simulated data. We also applied scVILP to an empirical human cancer dataset from a high grade serous ovarian cancer patient.

Download Full-text

A review of computational strategies for denoising and imputation of single-cell transcriptomic data

Briefings in Bioinformatics ◽

10.1093/bib/bbaa222 ◽

2020 ◽

Author(s):

Lucrezia Patruno ◽

Davide Maspero ◽

Francesco Craighero ◽

Fabrizio Angaroni ◽

Marco Antoniotti ◽

...

Keyword(s):

Single Cell ◽

Data Science ◽

Performance Metrics ◽

Expression Profiles ◽

Computation Time ◽

Sequencing Data ◽

Transcriptomic Data ◽

Single Cell Sequencing ◽

Real World Datasets

Abstract Motivation The advancements of single-cell sequencing methods have paved the way for the characterization of cellular states at unprecedented resolution, revolutionizing the investigation on complex biological systems. Yet, single-cell sequencing experiments are hindered by several technical issues, which cause output data to be noisy, impacting the reliability of downstream analyses. Therefore, a growing number of data science methods has been proposed to recover lost or corrupted information from single-cell sequencing data. To date, however, no quantitative benchmarks have been proposed to evaluate such methods. Results We present a comprehensive analysis of the state-of-the-art computational approaches for denoising and imputation of single-cell transcriptomic data, comparing their performance in different experimental scenarios. In detail, we compared 19 denoising and imputation methods, on both simulated and real-world datasets, with respect to several performance metrics related to imputation of dropout events, recovery of true expression profiles, characterization of cell similarity, identification of differentially expressed genes and computation time. The effectiveness and scalability of all methods were assessed with regard to distinct sequencing protocols, sample size and different levels of biological variability and technical noise. As a result, we identify a subset of versatile approaches exhibiting solid performances on most tests and show that certain algorithmic families prove effective on specific tasks but inefficient on others. Finally, most methods appear to benefit from the introduction of appropriate assumptions on noise distribution of biological processes.

Download Full-text

Tree inference for single-cell data

10.1101/047795 ◽

2016 ◽

Author(s):

Katharina Jahn ◽

Jack Kuipers ◽

Niko Beerenwinkel

Keyword(s):

Single Cell ◽

Search Algorithm ◽

Single Cells ◽

Error Rates ◽

Cancer Therapies ◽

Sequencing Data ◽

Cancer Data ◽

Posterior Probability Distribution ◽

Tree Inference ◽

History Of

AbstractUnderstanding the mutational heterogeneity within tumours is a keystone for the development of efficient cancer therapies. Here, we present SCITE, a stochastic search algorithm to identify the evolutionary history of a tumour from noisy and incomplete mutation profiles of single cells. SCITE comprises a exible MCMC sampling scheme that allows the user to compute the maximum-likelihood mutation history, to sample from the posterior probability distribution, and to estimate the error rates of the underlying sequencing experiments. Evaluation on real cancer data and on simulation studies shows the scalability of SCITE to present-day single-cell sequencing data and improved reconstruction accuracy compared to existing approaches.

Download Full-text

Tumor Copy Number Deconvolution Integrating Bulk and Single-Cell Sequencing Data

10.1101/519892 ◽

2019 ◽

Cited By ~ 2

Author(s):

Haoyun Lei ◽

Bochuan Lyu ◽

E. Michael Gertz ◽

Alejandro A. Schäffer ◽

Xulian Shi ◽

...

Keyword(s):

Single Cell ◽

Copy Number ◽

Simulated Data ◽

Mixed Integer ◽

Intratumor Heterogeneity ◽

Tumor Evolution ◽

Sequencing Data ◽

Minimum Evolution ◽

Promising Alternative ◽

Single Cell Sequencing

AbstractCharacterizing intratumor heterogeneity (ITH) is crucial to understanding cancer development, but it is hampered by limits of available data sources. Bulk DNA sequencing is the most common technology to assess ITH, but mixes many genetically distinct cells in each sample, which must then be computationally deconvolved. Single-cell sequencing (SCS) is a promising alternative, but its limitations — e.g., high noise, difficulty scaling to large populations, technical artifacts, and large data sets — have so far made it impractical for studying cohorts of sufficient size to identify statistically robust features of tumor evolution. We have developed strategies for deconvolution and tumor phylogenetics combining limited amounts of bulk and single-cell data to gain some advantages of single-cell resolution with much lower cost, with specific focus on deconvolving genomic copy number data. We developed a mixed membership model for clonal deconvolution via non-negative matrix factorization (NMF) balancing deconvolution quality with similarity to single-cell samples via an associated efficient coordinate descent algorithm. We then improve on that algorithm by integrating deconvolution with clonal phylogeny inference, using a mixed integer linear programming (MILP) model to incorporate a minimum evolution phylogenetic tree cost in the problem objective. We demonstrate the effectiveness of these methods on semi-simulated data of known ground truth, showing improved deconvolution accuracy relative to bulk data alone.

Download Full-text

CellCoal: Coalescent Simulation of Single-Cell Sequencing Samples

Molecular Biology and Evolution ◽

10.1093/molbev/msaa025 ◽

2020 ◽

Vol 37 (5) ◽

pp. 1535-1542 ◽

Cited By ~ 1

Author(s):

David Posada

Keyword(s):

Single Cell ◽

Single Cells ◽

Software Tool ◽

Coalescent Simulation ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Allelic Dropout ◽

Flexible Tool ◽

Single Cell Sequencing ◽

Somatic Evolution

Abstract Our capacity to study individual cells has enabled a new level of resolution for understanding complex biological systems such as multicellular organisms or microbial communities. Not surprisingly, several methods have been developed in recent years with a formidable potential to investigate the somatic evolution of single cells in both healthy and pathological tissues. However, single-cell sequencing data can be quite noisy due to different technical biases, so inferences resulting from these new methods need to be carefully contrasted. Here, I introduce CellCoal, a software tool for the coalescent simulation of single-cell sequencing genotypes. CellCoal simulates the history of single-cell samples obtained from somatic cell populations with different demographic histories and produces single-nucleotide variants under a variety of mutation models, sequencing read counts, and genotype likelihoods, considering allelic imbalance, allelic dropout, amplification, and sequencing errors, typical of this type of data. CellCoal is a flexible tool that can be used to understand the implications of different somatic evolutionary processes at the single-cell level, and to benchmark dedicated bioinformatic tools for the analysis of single-cell sequencing data. CellCoal is available at https://github.com/dapogon/cellcoal.

Download Full-text

Single-cell tumor phylogeny inference with copy-number constrained mutation losses

10.1101/840355 ◽

2019 ◽

Cited By ~ 1

Author(s):

Gryte Satas ◽

Simone Zaccaria ◽

Geoffrey Mon ◽

Benjamin J. Raphael

Keyword(s):

Single Cell ◽

Copy Number ◽

Phylogenetic Trees ◽

Colorectal Cancer Patient ◽

Simulated Data ◽

Cell Tumor ◽

Tumor Evolution ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Cell Sequencing

AbstractMotivationSingle-cell DNA sequencing enables the measurement of somatic mutations in individual tumor cells, and provides data to reconstruct the evolutionary history of the tumor. Nearly all existing methods to construct phylogenetic trees from single-cell sequencing data use single-nucleotide variants (SNVs) as markers. However, most solid tumors contain copy-number aberrations (CNAs) which can overlap loci containing SNVs. Particularly problematic are CNAs that delete an SNV, thus returning the SNV locus to the unmutated state. Such mutation losses are allowed in some models of SNV evolution, but these models are generally too permissive, allowing mutation losses without evidence of a CNA overlapping the locus.ResultsWe introduce a novel loss-supported evolutionary model, a generalization of the infinite sites and Dollo models, that constrains mutation losses to loci with evidence of a decrease in copy number. We design a new algorithm, Single-Cell Algorithm for Reconstructing the Loss-supported Evolution of Tumors (Scarlet), that infers phylogenies from single-cell tumor sequencing data using the loss-supported model and a probabilistic model of sequencing errors and allele dropout. On simulated data, we show that Scarlet outperforms current single-cell phylogeny methods, recovering more accurate trees and correcting errors in SNV data. On single-cell sequencing data from a metastatic colorectal cancer patient, Scarlet constructs a phylogeny that is both more consistent with the observed copy-number data and also reveals a simpler monooclonal seeding of the metastasis, contrasting with published reports of polyclonal seeding in this patient. Scarlet substantially improves single-cell phylogeny inference in tumors with CNAs, yielding new insights into the analysis of tumor evolution.AvailabilitySoftware is available at github.com/raphael-group/[email protected]

Download Full-text

SiCloneFit: Bayesian inference of population structure, genotype,and phylogeny of tumor clones from single-cell genome sequencing data

10.1101/394262 ◽

2018 ◽

Cited By ~ 3

Author(s):

Hamim Zafar ◽

Nicholas Navin ◽

Ken Chen ◽

Luay Nakhleh

Keyword(s):

Single Cell ◽

Single Cells ◽

Evolutionary Relationship ◽

Sequencing Data ◽

Allelic Dropout ◽

Site Model ◽

Joint Reconstruction ◽

Technical Errors ◽

Gibbs Sampling Algorithm ◽

Cell Genome

AbstractAccumulation and selection of somatic mutations in a Darwinian framework result in intra-tumor heterogeneity (ITH) that poses significant challenges to the diagnosis and clinical therapy of cancer. Identification of the tumor cell populations (clones) and reconstruction of their evolutionary relationship can elucidate this heterogeneity. Recently developed single-cell DNA sequencing (SCS) technologies promise to resolve ITH to a single-cell level. However, technical errors in SCS datasets, including false-positives (FP), false-negatives (FN) due to allelic dropout and cell doublets, significantly complicate these tasks. Here, we propose a non-parametric Bayesian method that reconstructs the clonal populations as clusters of single cells, genotypes of each clone and the evolutionary relationships between the clones. It employs a tree-structured Chinese restaurant process as the prior on the number and composition of clonal populations. The evolution of the clonal populations is modeled by a clonal phylogeny and a finite-site model of evolution to account for potential mutation recurrence and losses. We probabilistically account for FP and FN errors, and cell doublets are modeled by employing a Beta-binomial distribution. We develop a Gibbs sampling algorithm comprising of partial reversible-jump and partial Metropolis-Hastings updates to explore the joint posterior space of all parameters. The performance of our method on synthetic and experimental datasets suggests that joint reconstruction of tumor clones and clonal phylogeny under a finite-site model of evolution leads to more accurate inferences. Our method is the first to enable this joint reconstruction in a fully Bayesian framework, thus providing measures of support of the inferences it makes.

Download Full-text

484 Bioturing browser: interactively explore public single cell sequencing data

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0484 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A520-A520

Author(s):

Son Pham ◽

Tri Le ◽

Tan Phan ◽

Minh Pham ◽

Huy Nguyen ◽

...

Keyword(s):

Single Cell ◽

Immune Cell ◽

Expression Profiles ◽

Meta Analysis ◽

Cell Types ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Data Formats ◽

Cancer Types ◽

Cell Data

BackgroundSingle-cell sequencing technology has opened an unprecedented ability to interrogate cancer. It reveals significant insights into the intratumoral heterogeneity, metastasis, therapeutic resistance, which facilitates target discovery and validation in cancer treatment. With rapid advancements in throughput and strategies, a particular immuno-oncology study can produce multi-omics profiles for several thousands of individual cells. This overflow of single-cell data poses formidable challenges, including standardizing data formats across studies, performing reanalysis for individual datasets and meta-analysis.MethodsN/AResultsWe present BioTuring Browser, an interactive platform for accessing and reanalyzing published single-cell omics data. The platform is currently hosting a curated database of more than 10 million cells from 247 projects, covering more than 120 immune cell types and subtypes, and 15 different cancer types. All data are processed and annotated with standardized labels of cell types, diseases, therapeutic responses, etc. to be instantly accessed and explored in a uniform visualization and analytics interface. Based on this massive curated database, BioTuring Browser supports searching similar expression profiles, querying a target across datasets and automatic cell type annotation. The platform supports single-cell RNA-seq, CITE-seq and TCR-seq data. BioTuring Browser is now available for download at www.bioturing.com.ConclusionsN/A

Download Full-text

Reference-free inference of tumor phylogenies from single-cell sequencing data

2014 IEEE 4th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS) ◽

10.1109/iccabs.2014.6863944 ◽

2014 ◽

Author(s):

Ayshwarya Subramanian ◽

Russell Schwartz

Keyword(s):

Single Cell ◽

Sequencing Data ◽

Single Cell Sequencing

Download Full-text

Effective clustering for single cell sequencing cancer data

10.1101/586545 ◽

2019 ◽

Cited By ~ 2

Author(s):

Simone Ciccolella ◽

Murray Patterson ◽

Paola Bonizzoni ◽

Gianluca Della Vedova

Keyword(s):

Single Cell ◽

Categorical Data ◽

Euclidean Distance ◽

Missing Values ◽

False Negative ◽

Ground Truth ◽

Sequencing Data ◽

Large Space ◽

Cancer Data ◽

Single Cell Sequencing

AbstractBackgroundSingle cell sequencing (SCS) technologies provide a level of resolution that makes it indispensable for inferring from a sequenced tumor, evolutionary trees or phylogenies representing an accumulation of cancerous mutations. A drawback of SCS is elevated false negative and missing value rates, resulting in a large space of possible solutions, which in turn makes infeasible using some approaches and tools. While this has not inhibited the development of methods for inferring phylogenies from SCS data, the continuing increase in size and resolution of these data begin to put a strain on such methods.One possible solution is to reduce the size of an SCS instance — usually represented as a matrix of presence, absence and missing values of the mutations found in the different sequenced cells — and infer the tree from this reduced-size instance. Previous approaches have used k-means to this end, clustering groups of mutations and/or cells, and using these means as the reduced instance. Such an approach typically uses the Euclidean distance for computing means. However, since the values in these matrices are of a categorical nature (having the three categories: present, absent and missing), we explore techniques for clustering categorical data — commonly used in data mining and machine learning — to SCS data, with this goal in mind.ResultsIn this work, we present a new clustering procedure aimed at clustering categorical vector, or matrix data — here representing SCS instances, called celluloid. We demonstrate that celluloid clusters mutations with high precision: never pairing too many mutations that are unrelated in the ground truth, but also obtains accurate results in terms of the phylogeny inferred downstream from the reduced instance produced by this method.Finally, we demonstrate the usefulness of a clustering step by applying the entire pipeline (clustering + inference method) to a real dataset, showing a significant reduction in the runtime, raising considerably the upper bound on the size of SCS instances which can be solved in practice.AvailabilityOur approach, celluloid: clustering single cell sequencing data around centroids is available at https://github.com/AlgoLab/celluloid/ under an MIT license.

Download Full-text