Creating Standards for Evaluating Tumour Subclonal Reconstruction

Mapping Intimacies ◽

10.1101/310425 ◽

2018 ◽

Cited By ~ 3

Author(s):

Adriana Salcedo ◽

Maxime Tarabichi ◽

Shadrielle Melijah G. Espiritu ◽

Amit G. Deshwar ◽

Matei David ◽

...

Keyword(s):

Evolutionary Dynamics ◽

Read Depth ◽

Response To Therapy ◽

Systematic Evaluation ◽

Computational Techniques ◽

Cancer Evolution ◽

Sequencing Data ◽

Somatic Variant ◽

Quantitative Metrics ◽

Algorithmic Problems

AbstractTumours evolve through time and space. Computational techniques have been developed to infer their evolutionary dynamics from DNA sequencing data. A growing number of studies have used these approaches to link molecular cancer evolution to clinical progression and response to therapy. There has not yet been a systematic evaluation of methods for reconstructing tumour subclonality, in part due to the underlying mathematical and biological complexity and to difficulties in creating gold-standards. To fill this gap, we systematically elucidated the key algorithmic problems in subclonal reconstruction and developed mathematically valid quantitative metrics for evaluating them. We then created approaches to simulate realistic tumour genomes, harbouring all known mutation types and processes both clonally and subclonally. We then simulated 580 tumour genomes for reconstruction, varying tumour read-depth and benchmarking somatic variant detection and subclonal reconstruction strategies. The inference of tumour phylogenies is rapidly becoming standard practice in cancer genome analysis; this study creates a baseline for its evaluation.

Download Full-text

Opposing Evolutionary Pressures Drive Clonal Evolution and Health Outcomes in the Aging Blood System

Blood ◽

10.1182/blood-2020-142086 ◽

2020 ◽

Vol 136 (Supplement 1) ◽

pp. 37-37

Author(s):

Kimberly Skead ◽

Armande Ang Houle ◽

Sagi Abelson ◽

Marie-Julie Fave ◽

Boxi Lin ◽

...

Keyword(s):

Gene Disruption ◽

Large Scale ◽

Evolutionary Dynamics ◽

Clonal Evolution ◽

Neutral Evolution ◽

Driver Mutations ◽

Cancer Evolution ◽

Sequencing Data ◽

High Coverage ◽

Passenger Mutations

The age-associated accumulation of somatic mutations and large-scale structural variants (SVs) in the early hematopoietic hierarchy have been linked to premalignant stages for cancer and cardiovascular disease (CVD). However, only a small proportion of individuals harboring these mutations progress to disease, and mechanisms driving the transformation to malignancy remains unclear. Hematopoietic evolution, and cancer evolution more broadly, has largely been studied through a lens of adaptive evolution and the contribution of functionally neutral or mildly damaging mutations to early disease-associated clonal expansions has not been well characterised despite comprising the majority of the mutational burden in healthy or tumoural tissues. Through combining deep learning with population genetics, we interrogate the hematopoietic system to capture signatures of selection acting in healthy and pre-cancerous blood populations. Here, we leverage high-coverage sequencing data from healthy and pre-cancerous individuals from the European Prospective Investigation into Cancer and Nutrition Study (n=477) and dense genotyping from the Canadian Partnership for Tomorrow's Health (n=5,000) to show that blood rejects the paradigm of strictly adaptive or neutral evolution and is subject to pervasive negative selection. We observe clear age associations across hematopoietic populations and the dominant class of selection driving evolutionary dynamics acting at an individual level. We find that both the location and ratio of passenger to driver mutations are critical in determining if positive selection acting on driver mutations is able to overwhelm regulated hematopoiesis and allow clones harbouring disease-predisposing mutations to rise to dominance. Certain genes are enriched for passenger mutations in healthy individuals fitting purifying models of evolution, suggesting that the presence of passenger mutations in a subset of genes might confer a protective role against disease-predisposing clonal expansions. Finally, we find that the density of gene disruption events with known pathogenic associations in somatic SVs impacts the frequency at which the SV segregates in the population with variants displaying higher gene disruption density segregating at lower frequencies. Understanding how blood evolves towards malignancy will allow us to capture cancer in its earliest stages and identify events initiating departures from healthy blood evolution. Further, as the majority of mutations are passengers, studying their contribution to tumorigenesis, will unveil novel therapeutic targets thus enabling us to better understand patterns of clonal evolution in order to diagnose and treat disease in its infancy. Disclosures Dick: Bristol-Myers Squibb/Celgene: Research Funding.

Download Full-text

Measuring the distribution of fitness effects in somatic evolution by combining clonal dynamics with dN/dS ratios

10.1101/661264 ◽

2019 ◽

Author(s):

Marc J Williams ◽

Luiz Zapata ◽

Benjamin Werner ◽

Chris Barnes ◽

Andrea Sottoriva ◽

...

Keyword(s):

Evolutionary Dynamics ◽

Quantitative Model ◽

Driver Mutations ◽

Cancer Evolution ◽

Sequencing Data ◽

Synonymous Mutations ◽

Fitness Effects ◽

Somatic Evolution ◽

Normal Oesophagus

AbstractThe distribution of fitness effects (DFE) defines how new mutations spread through an evolving population. The ratio of non-synonymous to synonymous mutations (dN/dS) has become a popular method to detect selection in somatic cells, however the link, in somatic evolution, between dN/dS values and fitness coefficients is missing. Here we present a quantitative model of somatic evolutionary dynamics that yields the selective coefficients from individual driver mutations from dN/dS estimates, and then measure the DFE for somatic mutant clones in ostensibly normal oesophagus and skin. We reveal a broad distribution of fitness effects, with the largest fitness increases found for TP53 and NOTCH1 mutants (proliferative bias 1-5%). Accurate measurement of the per-gene DFE in cancer evolution is precluded by the quality of currently available sequencing data. This study provides the theoretical link between dN/dS values and selective coefficients in somatic evolution, and reveals the DFE for mutations in human tissues.

Download Full-text

Evolutionary dynamics of neoantigens in growing tumours

10.1101/536433 ◽

2019 ◽

Cited By ~ 8

Author(s):

Eszter Lakatos ◽

Marc J. Williams ◽

Ryan O. Schenck ◽

William C. H. Cross ◽

Jacob Househam ◽

...

Keyword(s):

Negative Selection ◽

Evolutionary Dynamics ◽

Immune Escape ◽

Patient Specific ◽

Mathematical Framework ◽

Cancer Evolution ◽

Sequencing Data ◽

Clone Size ◽

Strong Negative Selection ◽

Immune Escape Mechanisms

ABSTRACTCancer evolution is driven by the acquisition of somatic mutations that provide cells with a beneficial phenotype in a changing microenvironment. However, mutations that give rise to neoantigens, novel cancer–specific peptides that elicit an immune response, are likely to be disadvantageous. Here we show how the clonal structure and immunogenotype of growing tumours is shaped by negative selection in response to neoantigenic mutations. We construct a mathematical model of neoantigen evolution in a growing tumour, and verify the model using genomic sequencing data. The model predicts that, in the absence of active immune escape mechanisms, tumours either evolve clonal neoantigens (antigen– ‘hot’), or have no clonally– expanded neoantigens at all (antigen– ‘cold’), whereas antigen– ‘warm’ tumours (with high frequency subclonal neoantigens) form only following the evolution of immune evasion. Counterintuitively, strong negative selection for neoantigens during tumour formation leads to an increased number of antigen– warm or – hot tumours, as a consequence of selective pressure for immune escape. Further, we show that the clone size distribution under negative selection is effectively– neutral, and moreover, that stronger negative selection paradoxically leads to more neutral– like dynamics. Analysis of antigen clone sizes and immune escape in colorectal cancer exome sequencing data confirms these results. Overall, we provide and verify a mathematical framework to understand the evolutionary dynamics and clonality of neoantigens in human cancers that may inform patient– specific immunotherapy decision– making.

Download Full-text

The MURAL collection of prostate cancer patient-derived xenografts enables discovery through preclinical models of uro-oncology

Nature Communications ◽

10.1038/s41467-021-25175-5 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Gail P. Risbridger ◽

Ashlee K. Clark ◽

Laura H. Porter ◽

Roxanne Toivanen ◽

Andrew Bakshi ◽

...

Keyword(s):

Prostate Cancer ◽

Cancer Therapeutics ◽

Systematic Evaluation ◽

Sequencing Data ◽

Primary Tumors ◽

Combination Treatments ◽

Castration Resistant ◽

Substantial Resource ◽

Treatment Naïve ◽

Research Alliance

AbstractPreclinical testing is a crucial step in evaluating cancer therapeutics. We aimed to establish a significant resource of patient-derived xenografts (PDXs) of prostate cancer for rapid and systematic evaluation of candidate therapies. The PDX collection comprises 59 tumors collected from 30 patients between 2012–2020, coinciding with availability of abiraterone and enzalutamide. The PDXs represent the clinico-pathological and genomic spectrum of prostate cancer, from treatment-naïve primary tumors to castration-resistant metastases. Inter- and intra-tumor heterogeneity in adenocarcinoma and neuroendocrine phenotypes is evident from bulk and single-cell RNA sequencing data. Organoids can be cultured from PDXs, providing further capabilities for preclinical studies. Using a 1 x 1 x 1 design, we rapidly identify tumors with exceptional responses to combination treatments. To govern the distribution of PDXs, we formed the Melbourne Urological Research Alliance (MURAL). This PDX collection is a substantial resource, expanding the capacity to test and prioritize effective treatments for prospective clinical trials in prostate cancer.

Download Full-text

Measuring evolutionary cancer dynamics from genome sequencing, one patient at a time

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2020-0075 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Giulio Caravagna

Keyword(s):

Genome Sequencing ◽

Cancer Evolution ◽

Sequencing Data ◽

Evolutionary Forces ◽

Sequencing Technologies ◽

Cancer Genome Sequencing ◽

Multiple Resolutions ◽

Multiple Patients ◽

Single Tumour ◽

Generation Sequencing

AbstractCancers progress through the accumulation of somatic mutations which accrue during tumour evolution, allowing some cells to proliferate in an uncontrolled fashion. This growth process is intimately related to latent evolutionary forces moulding the genetic and epigenetic composition of tumour subpopulations. Understanding cancer requires therefore the understanding of these selective pressures. The adoption of widespread next-generation sequencing technologies opens up for the possibility of measuring molecular profiles of cancers at multiple resolutions, across one or multiple patients. In this review we discuss how cancer genome sequencing data from a single tumour can be used to understand these evolutionary forces, overviewing mathematical models and inferential methods adopted in field of Cancer Evolution.

Download Full-text

CliP: subclonal architecture reconstruction of cancer cells in DNA sequencing data using a penalized likelihood model

10.1101/2021.03.31.437383 ◽

2021 ◽

Author(s):

Yujie Jiang ◽

Kaixian Yu ◽

Shuangxi Ji ◽

Seung Jun Shin ◽

Shaolong Cao ◽

...

Keyword(s):

Prior Knowledge ◽

Penalized Likelihood ◽

Software Tool ◽

Structure Identification ◽

Tumor Evolution ◽

Cancer Evolution ◽

Post Processing ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Likelihood Model

Subpopulations of tumor cells characterized by mutation profiles may confer differential fitness and consequently influence prognosis of cancers. Understanding subclonal architecture has the potential to provide biological insight in tumor evolution and advance precision cancer treatment. Recent methods comprehensively integrate single nucleotide variants (SNVs) and copy number aberrations (CNAs) to reconstruct subclonal architecture using whole-genome or whole-exome sequencing (WGS, WES) data from bulk tumor samples. However, the commonly used Bayesian methods require a large amount of computational resources, a prior knowledge of the number of subclones, and extensive post-processing. Regularized likelihood modeling approach, never explored for subclonal reconstruction, can inherently address these drawbacks. We therefore propose a model-based method, Clonal structure identification through pair-wise Penalization, or CliP, for clustering subclonal mutations without prior knowledge or post-processing. The CliP model is applicable to genomic regions with or without CNAs. CliP demonstrates high accuracy in subclonal reconstruction through extensive simulation studies. Utilizing the well-established regularized likelihood framework, CliP takes only 16 hours to process WGS data from 2,778 tumor samples in the ICGC-PCAWG study, and 38 hours to process WES data from 9,564 tumor samples in the TCGA study. In summary, a penalized likelihood framework for subclonal reconstruction will help address intrinsic drawbacks of existing methods and expand the scope of computational analysis for cancer evolution in large cancer genomic studies. The associated software tool is freely available at: https://github.com/wwylab/CliP.

Download Full-text

Divergent and convergent evolution of housekeeping genes in human–pig lineage

PeerJ ◽

10.7717/peerj.4840 ◽

2018 ◽

Vol 6 ◽

pp. e4840 ◽

Cited By ~ 4

Author(s):

Kai Wei ◽

Tingting Zhang ◽

Lei Ma

Keyword(s):

Active Sites ◽

Evolutionary Dynamics ◽

Purifying Selection ◽

Housekeeping Genes ◽

Neutral Evolution ◽

Structure Evolution ◽

Tissue Cell ◽

Sequencing Data ◽

Cellular Functions ◽

Species Specific

Housekeeping genes are ubiquitously expressed and maintain basic cellular functions across tissue/cell type conditions. The present study aimed to develop a set of pig housekeeping genes and compare the structure, evolution and function of housekeeping genes in the human–pig lineage. By using RNA sequencing data, we identified 3,136 pig housekeeping genes. Compared with human housekeeping genes, we found that pig housekeeping genes were longer and subjected to slightly weaker purifying selection pressure and faster neutral evolution. Common housekeeping genes, shared by the two species, achieve stronger purifying selection than species-specific genes. However, pig- and human-specific housekeeping genes have similar functions. Some species-specific housekeeping genes have evolved independently to form similar protein active sites or structure, such as the classical catalytic serine–histidine–aspartate triad, implying that they have converged for maintaining the basic cellular function, which allows them to adapt to the environment. Human and pig housekeeping genes have varied structures and gene lists, but they have converged to maintain basic cellular functions essential for the existence of a cell, regardless of its specific role in the species. The results of our study shed light on the evolutionary dynamics of housekeeping genes.

Download Full-text

Spatially constrained tumour growth affects the patterns of clonal selection and neutral drift in cancer genomic data

10.1101/544536 ◽

2019 ◽

Cited By ~ 3

Author(s):

Kate Chkhaidze ◽

Timon Heide ◽

Benjamin Werner ◽

Marc J. Williams ◽

Weini Huang ◽

...

Keyword(s):

Next Generation Sequencing ◽

Tumour Growth ◽

Evolutionary Dynamics ◽

Clonal Selection ◽

Genomic Data ◽

Confounding Factors ◽

Data Generation ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

AbstractQuantification of the effect of spatial tumour sampling on the patterns of mutations detected in next-generation sequencing data is largely lacking. Here we use a spatial stochastic cellular automaton model of tumour growth that accounts for somatic mutations, selection, drift and spatial constrains, to simulate multi-region sequencing data derived from spatial sampling of a neoplasm. We show that the spatial structure of a solid cancer has a major impact on the detection of clonal selection and genetic drift from bulk sequencing data and single-cell sequencing data. Our results indicate that spatial constrains can introduce significant sampling biases when performing multi-region bulk sampling and that such bias becomes a major confounding factor for the measurement of the evolutionary dynamics of human tumours. We present a statistical inference framework that takes into account the spatial effects of a growing tumour and allows inferring the evolutionary dynamics from patient genomic data. Our analysis shows that measuring cancer evolution using next-generation sequencing while accounting for the numerous confounding factors requires a mechanistic model-based approach that captures the sources of noise in the data.SummarySequencing the DNA of cancer cells from human tumours has become one of the main tools to study cancer biology. However, sequencing data are complex and often difficult to interpret. In particular, the way in which the tissue is sampled and the data are collected, impact the interpretation of the results significantly. We argue that understanding cancer genomic data requires mathematical models and computer simulations that tell us what we expect the data to look like, with the aim of understanding the impact of confounding factors and biases in the data generation step. In this study, we develop a spatial simulation of tumour growth that also simulates the data generation process, and demonstrate that biases in the sampling step and current technological limitations severely impact the interpretation of the results. We then provide a statistical framework that can be used to overcome these biases and more robustly measure aspects of the biology of tumours from the data.

Download Full-text

CNV-P: a machine-learning framework for predicting high confident copy number variations

PeerJ ◽

10.7717/peerj.12564 ◽

2021 ◽

Vol 9 ◽

pp. e12564

Author(s):

Taifu Wang ◽

Jinghua Sun ◽

Xiuqing Zhang ◽

Wen-Jing Wang ◽

Qing Zhou

Keyword(s):

Machine Learning ◽

False Positive ◽

Copy Number ◽

Genetic Disorders ◽

Genetic Diseases ◽

Basic Research ◽

Read Depth ◽

Copy Number Variations ◽

Sequencing Data ◽

Learning Framework

Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.

Download Full-text

Somatic variant analysis of linked-reads sequencing data with Lancet

Bioinformatics ◽

10.1093/bioinformatics/btaa888 ◽

2020 ◽

Author(s):

Rajeeva Musunuri ◽

Kanika Arora ◽

André Corvelo ◽

Minita Shah ◽

Jennifer Shelton ◽

...

Keyword(s):

Supplementary Information ◽

De Bruijn Graph ◽

Haplotype Structure ◽

Sequencing Data ◽

Somatic Variant ◽

Local Assembly ◽

De Bruijn ◽

Variant Analysis ◽

Colored De Bruijn Graph ◽

Commercial Research

Abstract Summary We present a new version of the popular somatic variant caller, Lancet, that supports the analysis of linked-reads sequencing data. By seamlessly integrating barcodes and haplotype read assignments within the colored De Bruijn graph local-assembly framework, Lancet computes a barcode-aware coverage and identifies variants that disagree with the local haplotype structure. Availability and implementation Lancet is implemented in C++ and available for academic and non-commercial research purposes as an open-source package at https://github.com/nygenome/lancet. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text