Ccube: A fast and robust method for estimating cancer cell fractions

SVclone: inferring structural variant cancer cell fraction

10.1101/172486 ◽

2017 ◽

Cited By ~ 3

Author(s):

Marek Cmero ◽

Cheng Soon Ong ◽

Ke Yuan ◽

Jan Schröder ◽

Kangbo Mo ◽

...

Keyword(s):

Cancer Cell ◽

Copy Number ◽

Computational Method ◽

Cell Fraction ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Breast And Ovarian Cancer ◽

Structural Variant ◽

Cancer Cell Fraction

We present SVclone, a computational method for inferring the cancer cell fraction of structural variant breakpoints from whole-genome sequencing data. We validate our approach using simulated and real tumour samples, and demonstrate its utility on 2,778 whole-genome sequenced tumours. We find a subset of liver, breast and ovarian cancer cases with decreased overall survival that have subclonally enriched copy-number neutral rearrangements, an observation that could not be discovered with currently available methods.

Download Full-text

Absolute copy number fitting from shallow whole genome sequencing data

10.1101/2021.07.19.452658 ◽

2021 ◽

Author(s):

Carolin M Sauer ◽

Matthew D Eldridge ◽

Maria Vias ◽

James A Hall ◽

Samantha E Boyle ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Cancer Cell ◽

Genome Sequencing ◽

Copy Number ◽

Point Mutations ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Tissue Samples ◽

Absolute Copy Number

Low-coverage or shallow whole genome sequencing (sWGS) approaches can efficiently detect somatic copy number aberrations (SCNAs) at low cost. This is clinically important for many cancers, in particular cancers with severe chromosomal instability (CIN) that frequently lack actionable point mutations and are characterised by poor disease outcome. Absolute copy number (ACN), measured in DNA copies per cancer cell, is required for meaningful comparisons between copy number states, but is challenging to estimate and in practice often requires manual curation. Using a total of 60 cancer cell lines, 148 patient-derived xenograft (PDX) and 142 clinical tissue samples, we evaluate the performance of available tools for obtaining ACN from sWGS. We provide a validated and refined tool called Rascal (relative to absolute copy number scaling) that provides improved fitting algorithms and enables interactive visualisation of copy number profiles. These approaches are highly applicable to both pre-clinical and translational research studies on SCNA-driven cancers and provide more robust ACN fits from sWGS data than currently available tools.

Download Full-text

SMuRF: Portable and accurate ensemble-based somatic variant calling

10.1101/270413 ◽

2018 ◽

Cited By ~ 2

Author(s):

Weitai Huang ◽

Yu Amanda Guo ◽

Karthik Muthukumar ◽

Probhonjon Baruah ◽

Meimei Chang ◽

...

Keyword(s):

Point Mutations ◽

Variant Calling ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Somatic Variant ◽

Level Data ◽

Machine Learning Approach ◽

Cancer Types ◽

User Friendly ◽

Improved Accuracy

ABSTARCTSummarySMuRF is an ensemble method for prediction of somatic point mutations (SNVs) and small insertions/deletions (indels) in cancer genomes. The method integrates predictions and auxiliary features from different somatic mutation callers using a Random Forest machine learning approach. SMuRF is trained on community-curated tumor whole genome sequencing data, is robust across cancer types, and achieves improved accuracy for both SNV and indel predictions of genome and exome-level data. The software is user-friendly and portable by design, operating as an add-on to the community-developed bcbio-nextgen somatic variant calling [email protected]

Download Full-text

Genomic alterations driving breast cancer (BC) metastases and their relationship with the subtype switch in the GEICAM ConvertHER study.

Journal of Clinical Oncology ◽

10.1200/jco.2017.35.15_suppl.1017 ◽

2017 ◽

Vol 35 (15_suppl) ◽

pp. 1017-1017 ◽

Cited By ~ 1

Author(s):

Joan Albanell ◽

Abel Gonzalez ◽

Ana M. Gonzalez-Angulo ◽

Agda Karina Eterovic ◽

Eduardo Martinez-De Duenas ◽

...

Keyword(s):

Cancer Cell ◽

Intrinsic Subtype ◽

Cell Fraction ◽

Driver Mutations ◽

Her2 Status ◽

Genomic Alterations ◽

Expression Arrays ◽

Cancer Cell Fraction ◽

Cell Fractions ◽

Clinical Subtype

1017 Background: To understand the mechanisms underlying the evolution of tumors in the process of metastasis, we studied 61 paired primary-relapse BC from the GEICAM ConvertHER study. While some of the metastases maintained the clinical (ER/PR and HER2 status) and/or intrinsic subtype (defined by expression arrays) of the original tumor (concordant), others exhibited a subtype shift (discordant). We aimed to identify the genomic alterations driving the metastases and, particularly, their relationship with the subtype switch. Methods: We detected the somatic variants (mutations and copy number alterations (CNAs)) affecting 202 genes across the 61 sample pairs via targeted sequencing. We employed the Cancer Genome Interpreter (cancergenomeinterpreter.org), a bioinformatics approach to identify the alterations most likely driving tumorigenesis, and subsequently identified those whose cancer cell fraction markedly changed in the metastases. We explored the clonal remodeling in metastasis comparing the cell fractions of driver mutations in both concordant and discordant tumors. Results: We found that 156 genes had 747 somatic mutations and 171 genes suffered 1042 somatic CNAs in the 61 studied tumor pairs. We identified a median of 11 and 9 mutations in primaries and metastases, respectively. Several frequent BC mutational drivers, such as TP53, PIK3CA, MLL3, MAP3K1, and NOTCH2 were amongst the more frequently changed their cancer cell fraction in metastases with respect to primaries. We found that driver mutations of discordant tumors exhibited a significantly higher increase of clonal cell fraction. Moreover, whether the clonal status of a driver mutation was conserved in the metastasis was significantly associated to whether the tumor maintains its clinical subtype but not its intrinsic subtype. Conclusions: Our results suggest that a shift in the clinical subtype of BC undergoing metastasis is accompanied by more significant changes at the genomic level than those suffered by tumors that maintain their clinical subtype. This remodeling of the landscape of drivers could open new therapeutic opportunities to specifically target discordant BC.

Download Full-text

Dynamics of genetic variation in Transcription Factors and its implications for the evolution of regulatory networks in Bacteria

10.1101/785691 ◽

2019 ◽

Author(s):

Farhan Ali ◽

Aswin Sai Narain Seshasayee

Keyword(s):

Genetic Variation ◽

Transcription Factors ◽

Regulatory Networks ◽

Large Scale ◽

Target Genes ◽

Point Mutations ◽

Purifying Selection ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Global Regulators

AbstractThe evolution of bacterial regulatory networks has largely been explained at macroevolutionary scales through lateral gene transfer and gene duplication. Transcription factors (TF) have been found to be less conserved across species than their target genes (TG). This would be expected if TFs accumulate mutations faster than TGs. This hypothesis is supported by several lab evolution studies which found TFs, especially global regulators, to be frequently mutated. Despite these studies, the contribution of point mutations in TFs to the evolution of regulatory network is poorly understood. We tested if TFs show greater genetic variation than their TGs using whole-genome sequencing data from a large collection of E coli isolates. We found TFs to be less diverse, across natural isolates, due to their regulatory roles. TFs were enriched in mutations in multiple adaptive lab evolution studies but not in mutation accumulation. However, over long-term evolution, relative frequency of mutations in TFs showed a gradual decay after a rapid initial burst. Our results suggest that point mutations, conferring large-scale expression changes, may drive the early stages of adaptation but gene regulation is subjected to stronger purifying selection post adaptation.

Download Full-text

SomaticSniper: identification of somatic point mutations in whole genome sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btr665 ◽

2011 ◽

Vol 28 (3) ◽

pp. 311-317 ◽

Cited By ~ 354

Author(s):

David E. Larson ◽

Christopher C. Harris ◽

Ken Chen ◽

Daniel C. Koboldt ◽

Travis E. Abbott ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Point Mutations ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

Estimation of cancer cell fractions and clone trees from multi-region sequencing of tumors

10.1101/2021.06.12.448194 ◽

2021 ◽

Author(s):

Lily Zheng ◽

Laura Wood ◽

Rachel Karchin ◽

Robert B Scharpf

Keyword(s):

Cancer Cell ◽

Graphical Models ◽

Evolutionary History ◽

Clonal Diversity ◽

Evolutionary Relationships ◽

Small Subset ◽

Sequencing Data ◽

Multiple Biopsies ◽

Fundamental Uncertainty ◽

Cell Fractions

Multi-region sequencing of one or multiple biopsies of solid tumors from a patient can be used to improve our understanding of the diversity of subclones in the patient's tumor and shed light on the evolutionary history of the disease. Due to the large number of possible evolutionary relationships between clones and the fundamental uncertainty of the mutational composition of subclones, elucidating the most probable evolutionary relationships poses statistical and computational challenges. We developed a Bayesian hierarchical model called PICTograph to model uncertainty in the assignment of mutations to subclones and an approach to reduce the space of possible graphical models that postulate their evolutionary origin. Compared to available methods, our approach provided more consistent and accurate estimates of cancer cell fractions and better tree topology reconstruction over a range of simulated clonal diversity. Application of PICTograph to whole exome sequencing data of individuals with pancreatic cancer precursor lesions confirmed known early occurring mutations and indicated substantial molecular diversity, including multiple distinct subclones (range 6 - 12) and intra-sample mixing of subclones. As the complete evolutionary history for some patients was not identifiable, we used ensemble-based visualizations to distinguish between highly probable evolutionary relationships recovered in multiple models from uncertain relationships occurring in a small subset of models. These analyses indicate that PICTograph provides a useful approximation to evolutionary inference, particularly when the evolutionary course of a patient's cancer is complex.

Download Full-text

Prediction of antimicrobial resistance in clinical Campylobacter jejuni isolates from whole-genome sequencing data

European Journal of Clinical Microbiology & Infectious Diseases ◽

10.1007/s10096-020-04043-y ◽

2020 ◽

Author(s):

Louise Gade Dahl ◽

Katrine Grimstrup Joensen ◽

Mark Thomas Østerlund ◽

Kristoffer Kiil ◽

Eva Møller Nielsen

Keyword(s):

Antimicrobial Resistance ◽

Whole Genome Sequencing ◽

Campylobacter Jejuni ◽

Genome Sequencing ◽

Resistance Genes ◽

Point Mutations ◽

23S Rrna ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Abstract Campylobacter jejuni is recognised as the leading cause of bacterial gastroenteritis in industrialised countries. Although the majority of Campylobacter infections are self-limiting, antimicrobial treatment is necessary in severe cases. Therefore, the development of antimicrobial resistance (AMR) in Campylobacter is a growing public health challenge and surveillance of AMR is important for bacterial disease control. The aim of this study was to predict antimicrobial resistance in C. jejuni from whole-genome sequencing data. A total of 516 clinical C. jejuni isolates collected between 2014 and 2017 were subjected to WGS. Resistance phenotypes were determined by standard broth dilution, categorising isolates as either susceptible or resistant based on epidemiological cutoffs for six antimicrobials: ciprofloxacin, nalidixic acid, erythromycin, gentamicin, streptomycin, and tetracycline. Resistance genotypes were identified using an in-house database containing reference genes with known point mutations and the presence of resistance genes was determined using the ResFinder database and four bioinformatical methods (modified KMA, ABRicate, ARIBA, and ResFinder Batch Upload). We identified seven resistance genes including tet(O), tet(O/32/O), ant(6)-Ia, aph(2″)-If, blaOXA, aph(3′)-III, and cat as well as mutations in three genes: gyrA, 23S rRNA, and rpsL. There was a high correlation between phenotypic resistance and the presence of known resistance genes and/or point mutations. A correlation above 98% was seen for all antimicrobials except streptomycin with a correlation of 92%. In conclusion, we found that WGS can predict antimicrobial resistance with a high degree of accuracy and have the potential to be a powerful tool for AMR surveillance.

Download Full-text

DeCiFering the Elusive Cancer Cell Fraction in Tumor Heterogeneity and Evolution

10.1101/2021.02.27.429196 ◽

2021 ◽

Author(s):

Gryte Satas ◽

Simone Zaccaria ◽

Mohammed El-Kebir ◽

Benjamin J. Raphael

Keyword(s):

Phylogenetic Analysis ◽

Cancer Cells ◽

Cancer Cell ◽

Copy Number ◽

Tumor Heterogeneity ◽

Cell Fraction ◽

Tumor Evolution ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Cancer Cell Fraction

AbstractMost tumors are heterogeneous mixtures of normal cells and cancer cells, with individual cancer cells distinguished by somatic mutations that accumulated during the evolution of the tumor. The fundamental quantity used to measure tumor heterogeneity from somatic single-nucleotide variants (SNVs) is the Cancer Cell Fraction (CCF), or proportion of cancer cells that contain the SNV. However, in tumors containing copy-number aberrations (CNAs) – e.g. most solid tumors – the estimation of CCFs from DNA sequencing data is challenging because a CNA may alter the mutation multiplicity, or number of copies of an SNV. Existing methods to estimate CCFs rely on the restrictive Constant Mutation Multiplicity (CMM) assumption that the mutation multiplicity is constant across all tumor cells containing the mutation. However, the CMM assumption is commonly violated in tumors containing CNAs, and thus CCFs computed under the CMM assumption may yield unrealistic conclusions about tumor heterogeneity and evolution. The CCF also has a second limitation for phylogenetic analysis: the CCF measures the presence of a mutation at the present time, but SNVs may be lost during the evolution of a tumor due to deletions of chromosomal segments. Thus, SNVs that co-occur on the same phylogenetic branch may have different CCFs.In this work, we address these limitations of the CCF in two ways. First, we show how to compute the CCF of an SNV under a less restrictive and more realistic assumption called the Single Split Copy Number (SSCN) assumption. Second, we introduce a novel statistic, the descendant cell fraction (DCF), that quantifies both the prevalence of an SNV and the past evolutionary history of SNVs under an evolutionary model that allows for mutation losses. That is, SNVs that co-occur on the same phylogenetic branch will have the same DCF. We implement these ideas in an algorithm named DeCiFer. DeCiFer computes the DCFs of SNVs from read counts and copy-number proportions and also infers clusters of mutations that are suitable for phylogenetic analysis. We show that DeCiFer clusters SNVs more accurately than existing methods on simulated data containing mutation losses. We apply DeCiFer to sequencing data from 49 metastatic prostate cancer samples and show that DeCiFer produces more parsimonious and reasonable reconstructions of tumor evolution compared to previous approaches. Thus, DeCiFer enables more accurate quantification of intra-tumor heterogeneity and improves downstream inference of tumor evolution.Code availabilitySoftware is available at https://github.com/raphael-group/decifer

Download Full-text

A new method towards calculating the cancer cell fraction in cell-free DNA.

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.e13053 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. e13053-e13053

Author(s):

Tiancheng Han ◽

Jianing Yu ◽

Xiaojing Lin ◽

Hongyu Xie ◽

Xue Song ◽

...

Keyword(s):

Allele Frequency ◽

Cancer Cell ◽

Circulating Tumor Dna ◽

Accurate Estimation ◽

Sequencing Data ◽

Cell Free Dna ◽

Target Sequencing ◽

Paired Samples ◽

Free Dna ◽

Cell Fractions

e13053 Background: Circulating tumor DNA (ctDNA) has been applied and showed potential in cancer early/late-stage detection, tumor genotyping and post-operation recurrence monitoring. The fraction of ctDNA in cell-free DNA (noted as ccf hereby), in addition to standard SNV/INDEL/CNV analysis, has also been showed to associate with the tumor progression and prognosis. In theory, accurate ccf can further be useful in correcting and improving given SNV/INDEL/CNV results. Existing tools capable for calculating ccf (PureCN, FACETS, Sequenza, etc.) use coverage data in targeted regions and SNP allele frequency to calculate the tumor fraction, which fail to give accurate estimation at relatively low ctDNA concentrations. Methods: A Maximum Likelihood model was built to estimate ccf. We first select informative SNPs with significantly different VAF in the case and paired-control samples. The mutation type of an informative SNP is determined by the variant allele frequency (VAF) in the paired samples and the copy number of the case sample. Likelihood of each SNP given a specific ccf was then calculated. After clustering SNPs into clones, the ccf of each clone was estimated using a global likelihood. Results: Performance of the method was validated by ctDNA dilution series analysis. 6 cfDNA from cancer patient was diluted (concentrations: 1/3 - 1/81). Detection limit of the method is ~2%, and correlation between estimated and expected ccf ranged from 0.93 to 0.98. Conclusions: We have developed a novel method to better estimate cancer cell fractions in cell-free DNA. Results showed our method is able to calculate ccf at lower ctDNA concentrations with higher accuracy and stability than benchmarked tools. We describe here a method for target-sequencing data that is more sensible, accurate and stable than currently available tools.

Download Full-text