scholarly journals Inferring Cancer Progression from Single-cell Sequencing while Allowing Mutation Losses

2018 ◽  
Author(s):  
Simone Ciccolella ◽  
Mauricio Soto Gomez ◽  
Murray Patterson ◽  
Gianluca Della Vedova ◽  
Iman Hajirasouliha ◽  
...  

AbstractMotivationIn recent years, the well-known Infinite Sites Assumption (ISA) has been a fundamental feature of computational methods devised for reconstructing tumor phylogenies and inferring cancer progressions seen as an accumulation of mutations. However, recent studies (Kuiperset al., 2017) leveraging Single-cell Sequencing (SCS) techniques have shown evidence of the widespread recurrence and, especially, loss of mutations in several tumor samples. Still, established methods that can infer phylogenies with mutation losses are however lacking.ResultsWe present theSASC(Simulated Annealing Single-Cell inference) tool which is a new and robust approach based on simulated annealing for the inference of cancer progression from SCS data. More precisely, we introduce a simple extension of the model of evolution where mutations are only accumulated, by allowing also a limited amount of back mutations in the evolutionary history of the tumor: the Dollo-kmodel. We demonstrate thatSASCachieves high levels of accuracy when tested on both simulated and real data sets and in comparison with some other available methods.AvailabilityThe Simulated Annealing Single-cell inference (SASC) tool is open source and available athttps://github.com/sciccolella/[email protected]

Author(s):  
Simone Ciccolella ◽  
Camir Ricketts ◽  
Mauricio Soto Gomez ◽  
Murray Patterson ◽  
Dana Silverbush ◽  
...  

Abstract Motivation In recent years, the well-known Infinite Sites Assumption has been a fundamental feature of computational methods devised for reconstructing tumor phylogenies and inferring cancer progressions. However, recent studies leveraging single-cell sequencing (SCS) techniques have shown evidence of the widespread recurrence and, especially, loss of mutations in several tumor samples. While there exist established computational methods that infer phylogenies with mutation losses, there remain some advancements to be made. Results We present Simulated Annealing Single-Cell inference (SASC): a new and robust approach based on simulated annealing for the inference of cancer progression from SCS datasets. In particular, we introduce an extension of the model of evolution where mutations are only accumulated, by allowing also a limited amount of mutation loss in the evolutionary history of the tumor: the Dollo-k model. We demonstrate that SASC achieves high levels of accuracy when tested on both simulated and real datasets and in comparison with some other available methods. Availability and implementation The SASC tool is open source and available at https://github.com/sciccolella/sasc. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Jialu Hu ◽  
Yuanke Zhong ◽  
Xuequn Shang

Single-cell data provides us new ways of discovering biological truth at the level of individual cells, such as identification of cellular sub-populations and cell development. With the development of single-cell sequencing technologies, a key analytical challenge is to integrate these data sets to uncover biological insights. Here, we developed a domain-adversarial and variational approximation framework, DAVAE, to integrate multiple single-cell data across samples, technologies and modalities without any post hoc data processing. We fit normalized gene expression into a non-linear model, which transforms a latent variable of a lower-dimension into expression space with a non-linear function, a KL regularizier and a domain-adversarial regularizer. Results on five real data integration applications demonstrated the effectiveness and scalability of DAVAE in batch-effect removing, transfer learning, and cell type predictions for multiple single-cell data sets across samples, technologies and modalities. DAVAE was implemented in the toolkit package scbean in the pypi repository, and the source code can be also freely accessible at https://github.com/jhu99/scbean.


Author(s):  
Salem Malikić ◽  
Farid Rashidi Mehrabadi ◽  
Erfan Sadeqi Azer ◽  
Mohammad Haghir Ebrahimabadi ◽  
S. Cenk Sahinalp

AbstractSingle-cell sequencing data has great potential in reconstructing the evolutionary history of tumors. Rapid advances in single-cell sequencing technology in the past decade were followed by the design of various computational methods for inferring trees of tumor evolution. Some of the earliest of these methods were based on the direct search in the space of trees. However, it can be shown that instead of this tree search strategy we can perform a search in the space of binary matrices and obtain the most likely tree directly from the most likely among the candidate binary matrices. The search in the space of binary matrices can be expressed as an instance of integer linear or constraint satisfaction programming and solved by some of the available solvers, which typically provide a guarantee of optimality of the reported solution. In this review, we first describe one convenient tree representation of tumor evolutionary history and present tree scoring model that is most commonly used in the available methods. We then provide proof showing that the most likely tree of tumor evolution can be obtained directly from the most likely matrix from the space of candidate binary matrices. Next, we provide integer linear programming formulation to search for such matrix and summarize the existing methods based on this formulation or its extensions. Lastly, we present one use-case which illustrates how binary matrices can be used as a basis for developing a fast deep learning method for inferring some topological properties of the most likely tree of tumor evolution.


2021 ◽  
Author(s):  
Kylie Chen ◽  
David Welch ◽  
Alexei J. Drummond

Single-cell sequencing provides a new way to explore the evolutionary history of cancers. Compared to traditional bulk sequencing, which samples multiple heterogeneous cells, single-cell sequencing isolates and amplifies genetic material from a single cell. The ability to isolate a single cell makes it ideal for evolutionary inference. However, single-cell data is more error-prone due to the limited genomic material available per cell. Previous work using single-cell data to reconstruct the evolutionary history of cancers has not been integrated with standard evolutionary models. Here, we present error and mutation models for evolutionary inference of single-cell data within a mature and extensible Bayesian framework, BEAST2. Our framework enables integration with biologically informative models such as relaxed molecular clocks and population dynamic models. We reconstruct the phylogenetic history for a myeloproliferative cancer patient and two colorectal cancer patients. We find that the estimated times of terminal splitting events are shifted forward in time compared to models which ignore errors. Furthermore, we estimate 50% - 70% of the evolutionary distance between samples can be explained by sequencing error. Our simulation studies show that ignoring errors leads to inaccurate estimates of divergence times, mutation parameters and population parameters. Our work opens the potential for integrative Bayesian models capable of combining multiple sources of data.


2018 ◽  
Author(s):  
Simone Ciccolella ◽  
Mauricio Soto Gomez ◽  
Murray Patterson ◽  
Gianluca Della Vedova ◽  
Iman Hajirasouliha ◽  
...  

AbstractMotivationIn recent years, the well-known Infinite Sites Assumption (ISA) has been a fundamental feature of computational methods devised for reconstructing tumor phylogenies and inferring cancer progression where mutations are accumulated through histories. However, some recent studies leveraging Single Cell Sequencing (SCS) techniques have shown evidence of mutation losses in several tumor samples [19], making the inference problem harder.ResultsWe present a new tool, gpps, that reconstructs a tumor phylogeny from single cell data, allowing each mutation to be lost at most a fixed number of times.AvailabilityThe General Parsimony Phylogeny from Single cell (gpps) tool is open source and available at https://github.com/AlgoLab/gppf.


2020 ◽  
Author(s):  
Bitian Liu ◽  
Xiaonan Chen ◽  
Yunhong Zhan ◽  
Bin Wu ◽  
Shen Pan

Abstract Background: Cancer-associated fibroblasts (CAFs) are most abundant in stroma and are critically involved in cancer progression. However, the specific signature of CAFs and related clinicopathological parameters in renal cell carcinoma (RCC) remain unclear. Methods: In this work, methods using recognized gene signatures were employed to roughly assess the infiltration level of the stroma and CAFs in RCC based on the data in The Cancer Genome Atlas. Weighted gene co-expression network analysis (WGCNA) was used to cluster transcriptomes and correlate with CAFs to identify specific markers. A comparison of fibroblast versus urothelial carcinoma cell lines and correlation with previously reported CAF markers were performed to demonstrate the specific expressed of the gene signature. The gene signature was used to compare fibroblast infiltration of each sample through single sample gene set enrichment analysis, and the clinical significance of fibroblasts was analyzed via Cox risk assessment and the chi-square test. Finally, we used validation data to verify the clinical significance of the fibroblast gene signature in RCC. Results: Roughly calculated tumor matrix and CAF levels were significantly higher in kidney cancer than in normal tissues. More than 85% of fibroblast-specific markers identified by WGCNA were consistent with markers obtained via single-cell sequencing. These markers were more highly expressed in fibroblast cell lines and were significantly correlated with canonical CAFs makers. Data validation also showed that CAFs were significant correlation with survival and pathological grade. Conclusions: In summary, our findings indicate that the gene signature potentially serves as a biomarker of CAFs in RCC and that infiltration of fibroblasts in RCC is an independent prognostic factor associated with pathological grade and stage of tumor. The ability to recognize specific CAF markers using WGCNA is comparable to single-cell sequencing.


2017 ◽  
Vol 26 (20) ◽  
pp. 5541-5551 ◽  
Author(s):  
J. D. Medeiros ◽  
L. R. Leite ◽  
V. S. Pylro ◽  
F. S. Oliveira ◽  
V. M. Almeida ◽  
...  

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
N. Pierre Charrier ◽  
Axelle Hermouet ◽  
Caroline Hervet ◽  
Albert Agoulon ◽  
Stephen C. Barker ◽  
...  

Abstract Hard ticks are widely distributed across temperate regions, show strong variation in host associations, and are potential vectors of a diversity of medically important zoonoses, such as Lyme disease. To address unresolved issues with respect to the evolutionary relationships among certain species or genera, we produced novel RNA-Seq data sets for nine different Ixodes species. We combined this new data with 18 data sets obtained from public databases, both for Ixodes and non-Ixodes hard tick species, using soft ticks as an outgroup. We assembled transcriptomes (for 27 species in total), predicted coding sequences and identified single copy orthologues (SCO). Using Maximum-likelihood and Bayesian frameworks, we reconstructed a hard tick phylogeny for the nuclear genome. We also obtained a mitochondrial DNA-based phylogeny using published genome sequences and mitochondrial sequences derived from the new transcriptomes. Our results confirm previous studies showing that the Ixodes genus is monophyletic and clarify the relationships among Ixodes sub-genera. This work provides a baseline for studying the evolutionary history of ticks: we indeed found an unexpected acceleration of substitutions for mitochondrial sequences of Prostriata, and for nuclear and mitochondrial genes of two species of Rhipicephalus, which we relate with patterns of genome architecture and changes of life-cycle, respectively.


2021 ◽  
Author(s):  
Xianjie Huang ◽  
Yuanhua Huang

AbstractSummarySingle-cell sequencing is an increasingly used technology and has promising applications in basic research and clinical translations. However, genotyping methods developed for bulk sequencing data have not been well adapted for single-cell data, in terms of both computational parallelization and simplified user interface. Here we introduce a software, cellsnp-lite, implemented in C/C++ and based on well supported package htslib, for genotyping in single-cell sequencing data for both droplet and well based platforms. On various experimental data sets, it shows substantial improvement in computational speed and memory efficiency with retaining highly concordant results compared to existing methods. Cellsnp-lite therefore lightens the genetic analysis for increasingly large single-cell data.AvailabilityThe source code is freely available at https://github.com/single-cell-genetics/[email protected]


Sign in / Sign up

Export Citation Format

Share Document