Tumor subclonal progression model for cancer hallmark acquisition

Mapping Intimacies ◽

10.1101/149252 ◽

2017 ◽

Author(s):

Yusuke Matsui ◽

Satoru Miyano ◽

Teppei Shimamura

Keyword(s):

Large Scale ◽

Clear Cell ◽

Genomic Data ◽

Evolutionary Tree ◽

Cancer Evolution ◽

Sequencing Data ◽

Progression Model ◽

Important Challenge ◽

Evolutionary Trajectories ◽

Novel Method

AbstractRecent advances in the methods for reconstruction of cancer evolutionary trajectories opened up the prospects of deciphering the subclonal populations and their evolutionary architectures within cancer ecosystems. An important challenge of the cancer evolution studies is how to connect genetic aberrations in subclones to a clinically interpretable and actionable target in the subclones for individual patients. In this study, our aim is to develop a novel method for constructing a model of tumor subclonal progression in terms of cancer hallmark acquisition using multiregional sequencing data. We prepare a subclonal evolutionary tree inferred from variant allele frequencies and estimate pathway alteration probabilities from large-scale cohort genomic data. We then construct an evolutionary tree of pathway alterations that takes into account selectivity of pathway alterations via selectivity score. We show the effectiveness of our method on a dataset of clear cell renal cell carcinomas.

Download Full-text

Opposing Evolutionary Pressures Drive Clonal Evolution and Health Outcomes in the Aging Blood System

Blood ◽

10.1182/blood-2020-142086 ◽

2020 ◽

Vol 136 (Supplement 1) ◽

pp. 37-37

Author(s):

Kimberly Skead ◽

Armande Ang Houle ◽

Sagi Abelson ◽

Marie-Julie Fave ◽

Boxi Lin ◽

...

Keyword(s):

Gene Disruption ◽

Large Scale ◽

Evolutionary Dynamics ◽

Clonal Evolution ◽

Neutral Evolution ◽

Driver Mutations ◽

Cancer Evolution ◽

Sequencing Data ◽

High Coverage ◽

Passenger Mutations

The age-associated accumulation of somatic mutations and large-scale structural variants (SVs) in the early hematopoietic hierarchy have been linked to premalignant stages for cancer and cardiovascular disease (CVD). However, only a small proportion of individuals harboring these mutations progress to disease, and mechanisms driving the transformation to malignancy remains unclear. Hematopoietic evolution, and cancer evolution more broadly, has largely been studied through a lens of adaptive evolution and the contribution of functionally neutral or mildly damaging mutations to early disease-associated clonal expansions has not been well characterised despite comprising the majority of the mutational burden in healthy or tumoural tissues. Through combining deep learning with population genetics, we interrogate the hematopoietic system to capture signatures of selection acting in healthy and pre-cancerous blood populations. Here, we leverage high-coverage sequencing data from healthy and pre-cancerous individuals from the European Prospective Investigation into Cancer and Nutrition Study (n=477) and dense genotyping from the Canadian Partnership for Tomorrow's Health (n=5,000) to show that blood rejects the paradigm of strictly adaptive or neutral evolution and is subject to pervasive negative selection. We observe clear age associations across hematopoietic populations and the dominant class of selection driving evolutionary dynamics acting at an individual level. We find that both the location and ratio of passenger to driver mutations are critical in determining if positive selection acting on driver mutations is able to overwhelm regulated hematopoiesis and allow clones harbouring disease-predisposing mutations to rise to dominance. Certain genes are enriched for passenger mutations in healthy individuals fitting purifying models of evolution, suggesting that the presence of passenger mutations in a subset of genes might confer a protective role against disease-predisposing clonal expansions. Finally, we find that the density of gene disruption events with known pathogenic associations in somatic SVs impacts the frequency at which the SV segregates in the population with variants displaying higher gene disruption density segregating at lower frequencies. Understanding how blood evolves towards malignancy will allow us to capture cancer in its earliest stages and identify events initiating departures from healthy blood evolution. Further, as the majority of mutations are passengers, studying their contribution to tumorigenesis, will unveil novel therapeutic targets thus enabling us to better understand patterns of clonal evolution in order to diagnose and treat disease in its infancy. Disclosures Dick: Bristol-Myers Squibb/Celgene: Research Funding.

Download Full-text

Inferring a Tumor Progression Model for Neuroblastoma From Genomic Data

Journal of Clinical Oncology ◽

10.1200/jco.2005.03.2821 ◽

2005 ◽

Vol 23 (29) ◽

pp. 7322-7331 ◽

Cited By ~ 30

Author(s):

Sven Bilke ◽

Qing-Rong Chen ◽

Frank Westerman ◽

Manfred Schwab ◽

Daniel Catchpoole ◽

...

Keyword(s):

Tumor Progression ◽

Cancer Biology ◽

Clinical Evidence ◽

Genomic Data ◽

Comparative Genomic ◽

Molecular Evidence ◽

Cancer Evolution ◽

Progression Model ◽

Tumor Progression Model ◽

Core Idea

Purpose The knowledge of the key genomic events that are causal to cancer development and progression not only is invaluable for our understanding of cancer biology but also may have a direct clinical impact. The task of deciphering a model of tumor progression by requiring that it explains (or at least does not contradict) known clinical and molecular evidence can be very demanding, particularly for cancers with complex patterns of clinical and molecular evidence. Materials and Methods We formalize the process of model inference and show how a progression model for neuroblastoma (NB) can be inferred from genomic data. The core idea of our method is to translate the model of clonal cancer evolution to mathematical testable rules of inheritance. Seventy-eight NB samples in stages 1, 4S, and 4 were analyzed with array-based comparative genomic hybridization. Results The pattern of recurrent genomic alterations in NB is strongly stage dependent and it is possible to identify traces of tumor progression in this type of data. Conclusion A tumor progression model for neuroblastoma is inferred, which is in agreement with clinical evidence, explains part of the heterogeneity of the clinical behavior observed for NB, and is compatible with existing empirical models of NB progression.

Download Full-text

phyC: Clustering cancer evolutionary trees

10.1101/069302 ◽

2016 ◽

Cited By ~ 1

Author(s):

Yusuke Matsui ◽

Atsushi Niida ◽

Ryutaro Uchi ◽

Koshi Mimori ◽

Satoru Miyano ◽

...

Keyword(s):

Renal Carcinoma ◽

Clear Cell ◽

Edge Length ◽

Evolutionary Trees ◽

Cancer Type ◽

Cancer Evolution ◽

Sequencing Data ◽

Clear Cell Renal Carcinoma ◽

Cell Renal Carcinoma ◽

Insight Into

AbstractMotivationMulti-regional sequencing provides new opportunities to investigate genetic heterogeneity within or between common tumors from an evolutionary perspective. Several state-of-the-art methods have been proposed for reconstructing cancer sub-clonal evolutionary trees based on multi-regional sequencing data to develop models of cancer evolution. However, the methods developed thus far are not sufficient to characterize and interpret the diversity of cancer sub-clonal evolutionary trees.ResultsWe propose a clustering method (phyC) for cancer sub-clonal evolutionary trees, in which sub-groups of the trees are identified based on topology and edge length attributes. For interpretation, we also propose a method for evaluating the diversity of trees in the clusters, which provides insight into the acceleration of sub-clonal expansion. Simulation showed that the proposed method can detect true clusters with sufficient accuracy. Application of the method to actual multi-regional sequencing data of clear cell renal carcinoma and non-small cell lung cancer allowed for the detection of clusters related to cancer type or phenotype.AvailabilityphyC is implemented with R(>=3.2.2) and is available from https://github.com/ymatts/[email protected]

Download Full-text

The MOBSTER R package for tumour subclonal deconvolution from bulk DNA whole-genome sequencing data

BMC Bioinformatics ◽

10.1186/s12859-020-03863-1 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Giulio Caravagna ◽

Guido Sanguinetti ◽

Trevor A. Graham ◽

Andrea Sottoriva

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

R Package ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Evolutionary Forces ◽

Evolutionary Trajectories ◽

Cancer Tissues

Abstract Background The large-scale availability of whole-genome sequencing profiles from bulk DNA sequencing of cancer tissues is fueling the application of evolutionary theory to cancer. From a bulk biopsy, subclonal deconvolution methods are used to determine the composition of cancer subpopulations in the biopsy sample, a fundamental step to determine clonal expansions and their evolutionary trajectories. Results In a recent work we have developed a new model-based approach to carry out subclonal deconvolution from the site frequency spectrum of somatic mutations. This new method integrates, for the first time, an explicit model for neutral evolutionary forces that participate in clonal expansions; in that work we have also shown that our method improves largely over competing data-driven methods. In this Software paper we present mobster, an open source R package built around our new deconvolution approach, which provides several functions to plot data and fit models, assess their confidence and compute further evolutionary analyses that relate to subclonal deconvolution. Conclusions We present the mobster package for tumour subclonal deconvolution from bulk sequencing, the first approach to integrate Machine Learning and Population Genetics which can explicitly model co-existing neutral and positive selection in cancer. We showcase the analysis of two datasets, one simulated and one from a breast cancer patient, and overview all package functionalities.

Download Full-text

ROBUSTNESS OF METABOLIC MAP RECONSTRUCTION

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972000400079x ◽

2004 ◽

Vol 02 (03) ◽

pp. 589-593

Author(s):

DAG G. AHREN ◽

CHRISTOS A. OUZOUNIS

Keyword(s):

Large Scale ◽

Predictive Power ◽

Partial Information ◽

Short Note ◽

Genomic Data ◽

Metabolic Reconstruction ◽

Sequencing Data ◽

Biochemical Pathways ◽

Low Coverage ◽

Genome Projects

With the ever increasing amount of genomic data available, the interest for generating biochemical pathways has grown tremendously. So far, mainly complete genomes have been used to reconstruct the biochemical pathways and their associated interactions. However, a large number of low coverage genomes, as well as other sources of partial genomic data, are currently available for many organisms. In order to be able to use incomplete data for metabolic reconstruction, the inherent properties of this procedure need to be investigated. In this short note, we describe the robustness and predictive power of metabolic reconstructions using partial information from Schizosaccharomyces pombe. We also discuss the implications of the results on reference genome projects as well as other large-scale sequencing data.

Download Full-text

A pan-cancer signature of neutral tumor evolution

10.1101/014894 ◽

2015 ◽

Cited By ~ 2

Author(s):

Andrea Sottoriva ◽

Trevor Graham

Keyword(s):

Power Law ◽

Large Scale ◽

Evolutionary Dynamics ◽

Genomic Data ◽

Next Generation Sequencing Data ◽

Patient Specific ◽

Tumor Evolution ◽

Sequencing Data ◽

Power Law Distribution ◽

Multiple Tumor

Despite extraordinary efforts to profile cancer genomes on a large scale, interpreting the vast amount of genomic data in the light of cancer evolution and in a clinically relevant manner remains challenging. Here we demonstrate that cancer next-generation sequencing data is dominated by the signature of growth governed by a power-law distribution of mutant allele frequencies. The power-law signature is common to multiple tumor types and is a consequence of the effectively-neutral evolutionary dynamics that underpin the evolution of a large proportion of cancers, giving rise to the abundance of mutations responsible for intra-tumor heterogeneity. Importantly, the law allows the measurement, in each individual cancer, of the in vivo mutation rate and the timing of mutations with remarkable precision. This result provides a new way to interpret cancer genomic data by considering the physics of tumor growth in a way that is both patient-specific and clinically relevant.

Download Full-text

Practical guide for managing large-scale human genome data in research

Journal of Human Genetics ◽

10.1038/s10038-020-00862-1 ◽

2020 ◽

Vol 66 (1) ◽

pp. 39-52

Author(s):

Tomoya Tanjo ◽

Yosuke Kawai ◽

Katsushi Tokunaga ◽

Osamu Ogasawara ◽

Masao Nagasaki

Keyword(s):

Data Processing ◽

Whole Genome Sequencing ◽

Human Genome ◽

Genome Sequencing ◽

Large Scale ◽

Genomic Data ◽

Whole Genome ◽

Sequencing Data ◽

Genome Data ◽

Human Genome Data

AbstractStudies in human genetics deal with a plethora of human genome sequencing data that are generated from specimens as well as available on public domains. With the development of various bioinformatics applications, maintaining the productivity of research, managing human genome data, and analyzing downstream data is essential. This review aims to guide struggling researchers to process and analyze these large-scale genomic data to extract relevant information for improved downstream analyses. Here, we discuss worldwide human genome projects that could be integrated into any data for improved analysis. Obtaining human whole-genome sequencing data from both data stores and processes is costly; therefore, we focus on the development of data format and software that manipulate whole-genome sequencing. Once the sequencing is complete and its format and data processing tools are selected, a computational platform is required. For the platform, we describe a multi-cloud strategy that balances between cost, performance, and customizability. A good quality published research relies on data reproducibility to ensure quality results, reusability for applications to other datasets, as well as scalability for the future increase of datasets. To solve these, we describe several key technologies developed in computer science, including workflow engine. We also discuss the ethical guidelines inevitable for human genomic data analysis that differ from model organisms. Finally, the future ideal perspective of data processing and analysis is summarized.

Download Full-text

rmvPFBAM: Removing Primers from BAM Files Based on Amplicon-Based Next-Generation Sequencing and Cloud Computing When Analyzing Personal Genome Data

Scientific Programming ◽

10.1155/2021/6536470 ◽

2021 ◽

Vol 2021 ◽

pp. 1-6

Author(s):

Yanjun Ma

Keyword(s):

Next Generation Sequencing ◽

False Positive ◽

Large Scale ◽

Genomic Data ◽

Next Generation Sequencing Data ◽

Personal Genome ◽

Next Generation ◽

Sequencing Data ◽

Personal Genomic ◽

Generation Sequencing

Personal genomic data constitute one important part of personal health data. However, due to the large amount of personal genomic data obtained by the next-generation sequencing technology, special tools are needed to analyze these data. In this article, we will explore a tool analyzing cloud-based large-scale genome sequencing data. Analyzing and identifying genomic variations from amplicon-based next-generation sequencing data are necessary for the clinical diagnosis and treatment of cancer patients. When processing the amplicon-based next-generation sequencing data, one essential step is removing primer sequences from the reads to avoid detecting false-positive mutations introduced by nonspecific primer binding and primer extension reactions. At present, the removing primer tools usually discard primer sequences from the FASTQ file instead of BAM file, but this method could cause some downstream analysis problems. Only one tool (BAMClipper) removes primer sequences from BAM files, but it only modified the CIGAR value of the BAM file, and false-positive mutations falling in the primer region could still be detected based on its processed BAM file. So, we developed one cutting primer tool (rmvPFBAM) removing primer sequences from the BAM file, and the mutations detected based on the processed BAM file by rmvPFBAM are highly credible. Besides that, rmvPFBAM runs faster than other tools, such as cutPrimers and BAMClipper.

Download Full-text

FERMI: A novel method for sensitive detection of rare mutations in somatic tissue

10.1101/208066 ◽

2017 ◽

Cited By ~ 2

Author(s):

L. Alexander Liggett ◽

Anchal Sharma ◽

Subhajyoti De ◽

James DeGregori

Keyword(s):

Residual Disease ◽

Clonal Evolution ◽

Null Model ◽

Single Copy ◽

Error Rates ◽

Cancer Evolution ◽

Sequencing Data ◽

Bone Marrow Biopsies ◽

Rare Mutations ◽

Novel Method

AbstractWith growing interest in monitoring mutational processes in normal tissues, tumor heterogeneity, and cancer evolution under therapy, the ability to accurately and economically detect ultra-rare mutations is becoming increasingly important. However, this capability has often been compromised by significant sequencing, PCR and DNA preparation error rates. Here, we describe FERMI (Fast Extremely Rare Mutation Identification) - a novel method designed to eliminate majority of these sequencing and library preparation errors in order to significantly improve rare somatic mutation detection. This method leverages barcoded targeting probes to capture and sequence DNA of interest with single copy resolution. The variant calls from the barcoded sequencing data then further filtered in a position-dependent fashion against an adaptive, context-aware null model in order to distinguish true variants. As a proof of principle, we employ FERMI to probe bone marrow biopsies from leukemia patients, and show that rare mutations and clonal evolution can be tracked throughout cancer treatment, including during historically intractable periods like minimum residual disease. Importantly, FERMI is able to accurately detect nascent clonal expansions within leukemias in a manner that may facilitate the early detection and characterization of cancer relapse.

Download Full-text

Inference of Chromosome-Length Haplotypes Using Genomic Data of Three or a Few More Single Gametes

Molecular Biology and Evolution ◽

10.1093/molbev/msaa176 ◽

2020 ◽

Vol 37 (12) ◽

pp. 3684-3698 ◽

Cited By ~ 1

Author(s):

Ruidong Li ◽

Han Qu ◽

Jinfeng Chen ◽

Shibo Wang ◽

John M Chater ◽

...

Keyword(s):

Large Scale ◽

Low Cost ◽

Genomic Data ◽

Cost Effective ◽

R Package ◽

Chromosome Length ◽

Sequencing Data ◽

Experimental Phasing ◽

Wide Range ◽

Haplotype Data

Abstract Compared with genomic data of individual markers, haplotype data provide higher resolution for DNA variants, advancing our knowledge in genetics and evolution. Although many computational and experimental phasing methods have been developed for analyzing diploid genomes, it remains challenging to reconstruct chromosome-scale haplotypes at low cost, which constrains the utility of this valuable genetic resource. Gamete cells, the natural packaging of haploid complements, are ideal materials for phasing entire chromosomes because the majority of the haplotypic allele combinations has been preserved. Therefore, compared with the current diploid-based phasing methods, using haploid genomic data of single gametes may substantially reduce the complexity in inferring the donor’s chromosomal haplotypes. In this study, we developed the first easy-to-use R package, Hapi, for inferring chromosome-length haplotypes of individual diploid genomes with only a few gametes. Hapi outperformed other phasing methods when analyzing both simulated and real single gamete cell sequencing data sets. The results also suggested that chromosome-scale haplotypes may be inferred by using as few as three gametes, which has pushed the boundary to its possible limit. The single gamete cell sequencing technology allied with the cost-effective Hapi method will make large-scale haplotype-based genetic studies feasible and affordable, promoting the use of haplotype data in a wide range of research.

Download Full-text