scholarly journals SiFit: A Method for Inferring Tumor Trees from Single-Cell Sequencing Data under Finite-site Models

2016 ◽  
Author(s):  
Hamim Zafar ◽  
Anthony Tzen ◽  
Nicholas Navin ◽  
Ken Chen ◽  
Luay Nakhleh

AbstractSingle-cell sequencing (SCS) enables the inference of tumor phylogenies that provide insights on intra-tumor heterogeneity and evolutionary trajectories. Recently introduced methods perform this task under the infinite-sites assumption, violations of which, due to chromosomal deletions and loss of heterozygosity, necessitate the development of inference methods that utilize finite-site models. We propose a statistical inference method for tumor phylogenies from noisy SCS data under a finite-sites model. The performance of our method on synthetic and experimental datasets from two colorectal cancer patients to trace evolutionary lineages in primary and metastatic tumors suggest that employing a finite-sites model leads to improved inference of tumor phylogenies.

2021 ◽  
Author(s):  
Leila Baghaarabani ◽  
Sama Goliaei ◽  
Mohammad-Hadi Foroughmand-Araabi ◽  
Seyed Peyman Shariatpanahi ◽  
Bahram Goliaei

Abstract Background: An important and effective step in cancer treatment is understanding the clonal evolution of cancer tumors. Clones are cell populations with different genotypes, resulting from the differences in the somatic mutations that occur and accumulate during cancer development. An appropriate approach for better understanding a tumor population is determining the variant allele frequency with which the mutation occurs in the entire population. Bulk sequencing data can be used to provide that information, but the frequencies are not informative enough in identifying different clones and their evolutionary relationships. On the other hand, single-cell sequencing data provides valuable information about branching events in the evolution of a cancerous tumor. However, in the single-cell sequencing data, the total population of sequenced cells is naturally much smaller than bulk sequencing so it is not precise enough for calculating cell prevalence.Result: In this study, a new method called Conifer (ClONal tree Inference For hEterogeneity of tumoR) is proposed which combines aggregated variant allele frequency from bulk sequencing data with branch evolution information from single-cell sequencing data, in order to better understand clones and their evolutionary relationships. It is proven that the accuracy of clone identification is increased by using Conifer compared to other existing methods in both real and simulated data. Also, it is shown that the approach of Conifer in using single-cell sequencing data together with bulk sequencing data has reduced the possibility of cloning mutations with similar frequency but belonging to different clones.Conclusions: In this study, we provided an accurate and robust method to identify clones of tumor heterogeneity and their evolutionary history by combining single-cell and bulk sequencing data.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Leila Baghaarabani ◽  
Sama Goliaei ◽  
Mohammad-Hadi Foroughmand-Araabi ◽  
Seyed Peyman Shariatpanahi ◽  
Bahram Goliaei

Abstract Background Genetic heterogeneity of a cancer tumor that develops during clonal evolution is one of the reasons for cancer treatment failure, by increasing the chance of drug resistance. Clones are cell populations with different genotypes, resulting from differences in somatic mutations that occur and accumulate during cancer development. An appropriate approach for identifying clones is determining the variant allele frequency of mutations that occurred in the tumor. Although bulk sequencing data can be used to provide that information, the frequencies are not informative enough for identifying different clones with the same prevalence and their evolutionary relationships. On the other hand, single-cell sequencing data provides valuable information about branching events in the evolution of a cancerous tumor. However, the temporal order of mutations may be determined with ambiguities using only single-cell data, while variant allele frequencies from bulk sequencing data can provide beneficial information for inferring the temporal order of mutations with fewer ambiguities. Result In this study, a new method called Conifer (ClONal tree Inference For hEterogeneity of tumoR) is proposed which combines aggregated variant allele frequency from bulk sequencing data with branching event information from single-cell sequencing data to more accurately identify clones and their evolutionary relationships. It is proven that the accuracy of clone identification and clonal tree inference is increased by using Conifer compared to other existing methods on various sets of simulated data. In addition, it is discussed that the evolutionary tree provided by Conifer on real cancer data sets is highly consistent with information in both bulk and single-cell data. Conclusions In this study, we have provided an accurate and robust method to identify clones of tumor heterogeneity and their evolutionary history by combining single-cell and bulk sequencing data.


2018 ◽  
Author(s):  
Pavel Skums ◽  
Vyacheslau Tsivina ◽  
Alex Zelikovsky

AbstractIntra-tumor heterogeneity is one of the major factors influencing cancer progression and treatment outcome. However, evolutionary dynamics of cancer clone populations remain poorly understood. Quantification of clonal selection and inference of fitness landscapes of tumors is a key step to understanding evolutionary mechanisms driving cancer. These problems could be addressed using single cell sequencing, which provides an unprecedented insight into intra-tumor heterogeneity allowing to study and quantify selective advantages of individual clones. Here we present SCIFIL, a computational tool for inference of fitness landscapes of heterogeneous cancer clone populations from single cell sequencing data. SCIFIL allows to estimate maximum likelihood fitnesses of clone variants, measure their selective advantages and order of appearance by fitting an evolutionary model into the tumor phylogeny. We demonstrate the accuracy and utility of our approach on simulated and experimental data. SCIFIL can be used to provide new insight into the evolutionary dynamics of cancer. Its source code is available at https://github.com/compbel/SCIFIL


2019 ◽  
Vol 35 (14) ◽  
pp. i398-i407 ◽  
Author(s):  
Pavel Skums ◽  
Viachaslau Tsyvina ◽  
Alex Zelikovsky

Abstract Summary Intra-tumor heterogeneity is one of the major factors influencing cancer progression and treatment outcome. However, evolutionary dynamics of cancer clone populations remain poorly understood. Quantification of clonal selection and inference of fitness landscapes of tumors is a key step to understanding evolutionary mechanisms driving cancer. These problems could be addressed using single-cell sequencing (scSeq), which provides an unprecedented insight into intra-tumor heterogeneity allowing to study and quantify selective advantages of individual clones. Here, we present Single Cell Inference of FItness Landscape (SCIFIL), a computational tool for inference of fitness landscapes of heterogeneous cancer clone populations from scSeq data. SCIFIL allows to estimate maximum likelihood fitnesses of clone variants, measure their selective advantages and order of appearance by fitting an evolutionary model into the tumor phylogeny. We demonstrate the accuracy our approach, and show how it could be applied to experimental tumor data to study clonal selection and infer evolutionary history. SCIFIL can be used to provide new insight into the evolutionary dynamics of cancer. Availability and implementation Its source code is available at https://github.com/compbel/SCIFIL.


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A520-A520
Author(s):  
Son Pham ◽  
Tri Le ◽  
Tan Phan ◽  
Minh Pham ◽  
Huy Nguyen ◽  
...  

BackgroundSingle-cell sequencing technology has opened an unprecedented ability to interrogate cancer. It reveals significant insights into the intratumoral heterogeneity, metastasis, therapeutic resistance, which facilitates target discovery and validation in cancer treatment. With rapid advancements in throughput and strategies, a particular immuno-oncology study can produce multi-omics profiles for several thousands of individual cells. This overflow of single-cell data poses formidable challenges, including standardizing data formats across studies, performing reanalysis for individual datasets and meta-analysis.MethodsN/AResultsWe present BioTuring Browser, an interactive platform for accessing and reanalyzing published single-cell omics data. The platform is currently hosting a curated database of more than 10 million cells from 247 projects, covering more than 120 immune cell types and subtypes, and 15 different cancer types. All data are processed and annotated with standardized labels of cell types, diseases, therapeutic responses, etc. to be instantly accessed and explored in a uniform visualization and analytics interface. Based on this massive curated database, BioTuring Browser supports searching similar expression profiles, querying a target across datasets and automatic cell type annotation. The platform supports single-cell RNA-seq, CITE-seq and TCR-seq data. BioTuring Browser is now available for download at www.bioturing.com.ConclusionsN/A


2019 ◽  
Author(s):  
Emily F. Davis-Marcisak ◽  
Pranay Orugunta ◽  
Genevieve Stein-O'Brien ◽  
Sidharth V. Puram ◽  
Evanthia Roussos Torres ◽  
...  

2019 ◽  
Author(s):  
Simone Ciccolella ◽  
Murray Patterson ◽  
Paola Bonizzoni ◽  
Gianluca Della Vedova

AbstractBackgroundSingle cell sequencing (SCS) technologies provide a level of resolution that makes it indispensable for inferring from a sequenced tumor, evolutionary trees or phylogenies representing an accumulation of cancerous mutations. A drawback of SCS is elevated false negative and missing value rates, resulting in a large space of possible solutions, which in turn makes infeasible using some approaches and tools. While this has not inhibited the development of methods for inferring phylogenies from SCS data, the continuing increase in size and resolution of these data begin to put a strain on such methods.One possible solution is to reduce the size of an SCS instance — usually represented as a matrix of presence, absence and missing values of the mutations found in the different sequenced cells — and infer the tree from this reduced-size instance. Previous approaches have used k-means to this end, clustering groups of mutations and/or cells, and using these means as the reduced instance. Such an approach typically uses the Euclidean distance for computing means. However, since the values in these matrices are of a categorical nature (having the three categories: present, absent and missing), we explore techniques for clustering categorical data — commonly used in data mining and machine learning — to SCS data, with this goal in mind.ResultsIn this work, we present a new clustering procedure aimed at clustering categorical vector, or matrix data — here representing SCS instances, called celluloid. We demonstrate that celluloid clusters mutations with high precision: never pairing too many mutations that are unrelated in the ground truth, but also obtains accurate results in terms of the phylogeny inferred downstream from the reduced instance produced by this method.Finally, we demonstrate the usefulness of a clustering step by applying the entire pipeline (clustering + inference method) to a real dataset, showing a significant reduction in the runtime, raising considerably the upper bound on the size of SCS instances which can be solved in practice.AvailabilityOur approach, celluloid: clustering single cell sequencing data around centroids is available at https://github.com/AlgoLab/celluloid/ under an MIT license.


2019 ◽  
Author(s):  
Christina Huan Shi ◽  
Kevin Y. Yip

AbstractK-mer counting has many applications in sequencing data processing and analysis. However, sequencing errors can produce many false k-mers that substantially increase the memory requirement during counting. We propose a fast k-mer counting method, CQF-deNoise, which has a novel component for dynamically identifying and removing false k-mers while preserving counting accuracy. Compared with four state-of-the-art k-mer counting methods, CQF-deNoise consumed 49-76% less memory than the second best method, but still ran competitively fast. The k-mer counts from CQF-deNoise produced cell clusters from single-cell RNA-seq data highly consistent with CellRanger but required only 5% of the running time at the same memory consumption, suggesting that CQF-deNoise can be used for a preview of cell clusters for an early detection of potential data problems, before running a much more time-consuming full analysis pipeline.


Sign in / Sign up

Export Citation Format

Share Document