A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa672 ◽

2020 ◽

Author(s):

Dimitrios V Vavoulis ◽

Anthony Cutts ◽

Jenny C Taylor ◽

Anna Schuh

Keyword(s):

Dirichlet Process ◽

Model Performance ◽

Supplementary Information ◽

Next Generation Sequencing Data ◽

Model Parameters ◽

Dirichlet Process Mixture ◽

Sequencing Data ◽

Sample Collection ◽

Liquid Biopsies ◽

Cross Sectional

Abstract Motivation Tumours are composed of distinct cancer cell populations (clones), which continuously adapt to their local micro-environment. Standard methods for clonal deconvolution seek to identify groups of mutations and estimate the prevalence of each group in the tumour, while considering its purity and copy number profile. These methods have been applied on cross-sectional data and on longitudinal data after discarding information on the timing of sample collection. Two key questions are how can we incorporate such information in our analyses and is there any benefit in doing so? Results We developed a clonal deconvolution method, which incorporates explicitly the temporal spacing of longitudinally sampled tumours. By merging a Dirichlet Process Mixture Model with Gaussian Process priors and using as input a sequence of several sparsely collected samples, our method can reconstruct the temporal profile of the abundance of any mutation cluster supported by the data as a continuous function of time. We benchmarked our method on whole genome, whole exome and targeted sequencing data from patients with chronic lymphocytic leukaemia, on liquid biopsy data from a patient with melanoma and on synthetic data and we found that incorporating information on the timing of tissue collection improves model performance, as long as data of sufficient volume and complexity are available for estimating free model parameters. Thus, our approach is particularly useful when collecting a relatively long sequence of tumour samples is feasible, as in liquid cancers (e.g. leukaemia) and liquid biopsies. Availability and implementation The statistical methodology presented in this paper is freely available at github.com/dvav/clonosGP. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data

10.1101/2020.01.20.913236 ◽

2020 ◽

Author(s):

Dimitrios V. Vavoulis ◽

Anthony Cutts ◽

Jenny C. Taylor ◽

Anna Schuh

Keyword(s):

Next Generation Sequencing ◽

Longitudinal Data ◽

Darwinian Evolution ◽

Next Generation Sequencing Data ◽

Model Parameters ◽

Next Generation ◽

Sequencing Data ◽

Sample Collection ◽

Liquid Biopsies ◽

Generation Sequencing

ABSTRACTTumours are composed of genotypically and phenotypically distinct cancer cell populations (clones), which are subject to a process of Darwinian evolution in response to changes in their local micro-environment, such as drug treatment. In a cancer patient, this process of continuous adaptation can be studied through next-generation sequencing of multiple tumour samples combined with appropriate bioinformatics and statistical methodologies. One family of statistical methods for clonal deconvolution seeks to identify groups of mutations and estimate the prevalence of each group in the tumour, while taking into account its purity and copy number profile. These methods have been used in the analysis of cross-sectional data, as well as for longitudinal data by discarding information on the timing of sample collection. Two key questions are how (in the case of longitudinal data) can we incorporate such information in our analyses and if there is any benefit in doing so. Regarding the first question, we incorporated information on the temporal spacing of longitudinally collected samples into standard non-parametric approaches for clonal deconvolution by modelling the time dependence of the prevalence of each clone as a Gaussian process. This permitted reconstruction of the temporal profile of the abundance of each clone continuously from several sparsely collected samples and without any strong prior assumptions on the functional form of this profile. Regarding the second question, we tested various model configurations on a range of whole genome, whole exome and targeted sequencing data from patients with chronic lymphocytic leukaemia, on liquid biopsy data from a patient with melanoma and on synthetic data. We demonstrate that incorporating temporal information in our analysis improves model performance, as long as data of sufficient volume and complexity are available for estimating free model parameters. We expect that our approach will be useful in cases where collecting a relatively long sequence of tumour samples is feasible, as in the case of liquid cancers (e.g. leukaemia) and liquid biopsies. The statistical methodology presented in this paper is freely available at github.com/dvav/clonosGP.

Download Full-text

VikNGS: A C ++ Variant Integration Kit for Next Generation Sequencing Association Analysis

Bioinformatics ◽

10.1093/bioinformatics/btz716 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zeynep Baskurt ◽

Scott Mastromatteo ◽

Jiafen Gong ◽

Richard F Wintle ◽

Stephen W Scherer ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genetic Association ◽

Association Analysis ◽

Supplementary Information ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Next Generation ◽

Sequencing Data ◽

Combining Data ◽

Generation Sequencing

Abstract Integration of next generation sequencing data (NGS) across different research studies can improve the power of genetic association testing by increasing sample size and can obviate the need for sequencing controls. If differential genotype uncertainty across studies is not accounted for, combining data sets can produce spurious association results. We developed the Variant Integration Kit for NGS (VikNGS), a fast cross-platform software package, to enable aggregation of several data sets for rare and common variant genetic association analysis of quantitative and binary traits with covariate adjustment. VikNGS also includes a graphical user interface, power simulation functionality and data visualization tools. Availability The VikNGS package can be downloaded at http://www.tcag.ca/tools/index.html. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ScaleQC: a scalable lossy to lossless solution for NGS data compression

Bioinformatics ◽

10.1093/bioinformatics/btaa543 ◽

2020 ◽

Vol 36 (17) ◽

pp. 4551-4559 ◽

Cited By ~ 1

Author(s):

Rongshan Yu ◽

Wenxian Yang

Keyword(s):

Lossless Compression ◽

Supplementary Information ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Source Codes ◽

Compression Performance ◽

Data Rates ◽

Quality Value ◽

Ngs Data ◽

Bit Stream

Abstract Motivation Per-base quality values in Next Generation Sequencing data take a significant portion of storage even after compression. Lossy compression technologies could further reduce the space used by quality values. However, in many applications, lossless compression is still desired. Hence, sequencing data in multiple file formats have to be prepared for different applications. Results We developed a scalable lossy to lossless compression solution for quality values named ScaleQC (Scalable Quality value Compression). ScaleQC is able to provide the so-called bit-stream level scalability that the losslessly compressed bit-stream by ScaleQC can be further truncated to lower data rates without incurring an expensive transcoding operation. Despite its scalability, ScaleQC still achieves comparable compression performance at both lossless and lossy data rates compared to the existing lossless or lossy compressors. Availability and implementation ScaleQC has been integrated with SAMtools as a special quality value encoding mode for CRAM. Its source codes can be obtained from our integrated SAMtools (https://github.com/xmuyulab/samtools) with dependency on integrated HTSlib (https://github.com/xmuyulab/htslib). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ABEMUS: platform-specific and data-informed detection of somatic SNVs in cfDNA

Bioinformatics ◽

10.1093/bioinformatics/btaa016 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2665-2674

Author(s):

Nicola Casiraghi ◽

Francesco Orlando ◽

Yari Ciani ◽

Jenny Xiang ◽

Andrea Sboner ◽

...

Keyword(s):

Cancer Patients ◽

R Package ◽

Circulating Tumor Dna ◽

Supplementary Information ◽

Sequencing Error ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Liquid Biopsies ◽

Non Invasive ◽

Cross Platform

Abstract Motivation The use of liquid biopsies for cancer patients enables the non-invasive tracking of treatment response and tumor dynamics through single or serial blood drawn tests. Next-generation sequencing assays allow for the simultaneous interrogation of extended sets of somatic single-nucleotide variants (SNVs) in circulating cell-free DNA (cfDNA), a mixture of DNA molecules originating both from normal and tumor tissue cells. However, low circulating tumor DNA (ctDNA) fractions together with sequencing background noise and potential tumor heterogeneity challenge the ability to confidently call SNVs. Results We present a computational methodology, called Adaptive Base Error Model in Ultra-deep Sequencing data (ABEMUS), which combines platform-specific genetic knowledge and empirical signal to readily detect and quantify somatic SNVs in cfDNA. We tested the capability of our method to analyze data generated using different platforms with distinct sequencing error properties and we compared ABEMUS performances with other popular SNV callers on both synthetic and real cancer patients sequencing data. Results show that ABEMUS performs better in most of the tested conditions proving its reliability in calling low variant allele frequencies somatic SNVs in low ctDNA levels plasma samples. Availability and implementation ABEMUS is cross-platform and can be installed as R package. The source code is maintained on Github at http://github.com/cibiobcg/abemus, and it is also available at CRAN official R repository. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Algorithmic methods to infer the evolutionary trajectories in cancer progression

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1520213113 ◽

2016 ◽

Vol 113 (28) ◽

pp. E4025-E4034 ◽

Cited By ~ 38

Author(s):

Giulio Caravagna ◽

Alex Graudenzi ◽

Daniele Ramazzotti ◽

Rebeca Sanz-Pamplona ◽

Luca De Sano ◽

...

Keyword(s):

Cancer Progression ◽

Current Knowledge ◽

Population Level ◽

Selective Advantage ◽

Explanatory Models ◽

Next Generation Sequencing Data ◽

Driver Mutations ◽

Sequencing Data ◽

Cross Sectional ◽

Progression Model

The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next-generation sequencing data and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent work on the “selective advantage” relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular, and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications because it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations, and progression model inference. We demonstrate PiCnIc’s ability to reproduce much of the current knowledge on colorectal cancer progression as well as to suggest novel experimentally verifiable hypotheses.

Download Full-text

A Bayesian Nonparametric Model for Inferring Subclonal Populations from Structured DNA Sequencing Data

10.1101/2020.11.10.330183 ◽

2020 ◽

Author(s):

Shai He ◽

Aaron Schein ◽

Vishal Sarsani ◽

Patrick Flaherty

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Dirichlet Process ◽

Lymphoblastic Leukemia ◽

Nonparametric Model ◽

Dirichlet Process Mixture ◽

Sequencing Data ◽

Hierarchical Dirichlet Process ◽

Dirichlet Process Prior

There are distinguishing features or “hallmarks” of cancer that are found across tumors, individuals, and types of cancer, and these hallmarks can be driven by specific genetic mutations. Yet, within a single tumor there is often extensive genetic heterogeneity as evidenced by single-cell and bulk DNA sequencing data. The goal of this work is to jointly infer the underlying genotypes of tumor subpopulations and the distribution of those subpopulations in individual tumors by integrating single-cell and bulk sequencing data. Understanding the genetic composition of the tumor at the time of treatment is important in the personalized design of targeted therapeutic combinations and monitoring for possible recurrence after treatment.We propose a hierarchical Dirichlet process mixture model that incorporates the correlation structure induced by a structured sampling arrangement and we show that this model improves the quality of inference. We develop a representation of the hierarchical Dirichlet process prior as a Gamma-Poisson hierarchy and we use this representation to derive a fast Gibbs sampling inference algorithm using the augment-and-marginalize method. Experiments with simulation data show that our model outperforms standard numerical and statistical methods for decomposing admixed count data. Analyses of real acute lymphoblastic leukemia cancer sequencing dataset shows that our model improves upon state-of-the-art bioinformatic methods. An interpretation of the results of our model on this real dataset reveals co-mutated loci across samples.

Download Full-text

Pisces: An Accurate and Versatile Variant Caller for Somatic and Germline Next-Generation Sequencing Data

10.1101/291641 ◽

2018 ◽

Cited By ~ 1

Author(s):

Tamsen Dunn ◽

Gwenn Berry ◽

Dorothea Emig-Agius ◽

Yu Jiang ◽

Serena Lei ◽

...

Keyword(s):

Next Generation Sequencing ◽

Gene Mutations ◽

Variant Calling ◽

Amplicon Sequencing ◽

Supplementary Information ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Ras Gene ◽

Generation Sequencing

AbstractMotivationNext-Generation Sequencing (NGS) technology is transitioning quickly from research labs to clinical settings. The diagnosis and treatment selection for many acquired and autosomal conditions necessitate a method for accurately detecting somatic and germline variants, suitable for the clinic.ResultsWe have developed Pisces, a rapid, versatile and accurate small variant calling suite designed for somatic and germline amplicon sequencing applications. Pisces accuracy is achieved by four distinct modules, the Pisces Read Stitcher, Pisces Variant Caller, the Pisces Variant Quality Recalibrator, and the Pisces Variant Phaser. Each module incorporates a number of novel algorithmic strategies aimed at reducing noise or increasing the likelihood of detecting a true variant.AvailabilityPisces is distributed under an open source license and can be downloaded from https://github.com/Illumina/Pisces. Pisces is available on the BaseSpace™ SequenceHub as part of the TruSeq Amplicon workflow and the Illumina Ampliseq Workflow. Pisces is distributed on Illumina sequencing platforms such as the MiSeq™, and is included in the Praxis™ Extended RAS Panel test which was recently approved by the FDA for the detection of multiple RAS gene [email protected] informationSupplementary data are available online.

Download Full-text

An information-theoretic approach for measuring the distance of organ tissue samples using their transcriptomic signatures

10.1101/2020.01.23.917245 ◽

2020 ◽

Author(s):

Dimitris V. Manatakis ◽

Aaron VanDevender ◽

Elias S. Manolakos

Keyword(s):

Ex Vivo ◽

Practical Importance ◽

Supplementary Information ◽

Next Generation Sequencing Data ◽

Theoretic Approach ◽

Sequencing Data ◽

Tissue Samples ◽

Human Organ ◽

Information Theoretic ◽

Organ Models

AbstractMotivationRecapitulating aspects of human organ functions using in-vitro (e.g., plates, transwells, etc.), in-vivo (e.g., mouse, rat, etc.), or ex-vivo (e.g., organ chips, 3D systems, etc.) organ models are of paramount importance for precision medicine and drug discovery. It will allow us to identify potential side effects and test the effectiveness of therapeutic approaches early in their design phase and will inform the development of accurate disease models. Developing mathematical methods to reliably compare the “distance/similarity” of organ models from/to the real human organ they represent is an understudied problem with important applications in biomedicine and tissue engineering.ResultsWe introduce the Transctiptomic Signature Distance, TSD, an information-theoretic distance for assessing the transcriptomic similarity of two tissue samples, or two groups of tissue samples. In developing TSD, we are leveraging next-generation sequencing data and information retrieved from well-curated databases providing signature gene sets characteristic for human organs. We present the justification and mathematical development of the new distance and demonstrate its effectiveness in different scenarios of practical importance using several publicly available RNA-seq [email protected] informationSupplementary data are available at bioRxiv.

Download Full-text

Wx: a neural network-based feature selection algorithm for next-generation sequencing data

10.1101/221911 ◽

2017 ◽

Cited By ~ 1

Author(s):

Sungsoo Park ◽

Bonggun Shin ◽

Yoonjung Choi ◽

Kilsoo Kang ◽

Keunsoo Kang

Keyword(s):

Neural Network ◽

Gene Expression ◽

Next Generation Sequencing ◽

Supplementary Information ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Selection Algorithm ◽

Sequencing Data ◽

Optimal Set ◽

Generation Sequencing

AbstractMotivationNext-generation sequencing (NGS), which allows the simultaneous sequencing of billions of DNA fragments simultaneously, has revolutionized how we study genomics and molecular biology by generating genome-wide molecular maps of molecules of interest. For example, an NGS-based transcriptomic assay called RNA-seq can be used to estimate the abundance of approximately 190,000 transcripts together. As the cost of next-generation sequencing sharply declines, researchers in many fields have been conducting research using NGS. The amount of information produced by NGS has made it difficult for researchers to choose the optimal set of target genes (or genomic loci).ResultsWe have sought to resolve this issue by developing a neural network-based feature (gene) selection algorithm called Wx. The Wx algorithm ranks genes based on the discriminative index (DI) score that represents the classification power for distinguishing given groups. With a gene list ranked by DI score, researchers can institutively select the optimal set of genes from the highest-ranking ones. We applied the Wx algorithm to a TCGA pan-cancer gene-expression cohort to identify an optimal set of gene-expression biomarker (universal gene-expression biomarkers) candidates that can distinguish cancer samples from normal samples for 12 different types of cancer. The 14 gene-expression biomarker candidates identified by Wx were comparable to or outperformed previously reported universal gene expression biomarkers, highlighting the usefulness of the Wx algorithm for next-generation sequencing data. Thus, we anticipate that the Wx algorithm can complement current state-of-the-art analytical applications for the identification of biomarker candidates as an alternative method.Availabilityhttps://github.com/deargen/[email protected] informationSupplementary data are available at online.

Download Full-text

Innovation and Advances in Precision Medicine in Head and Neck Cancer

Critical Issues in Head and Neck Oncology ◽

10.1007/978-3-030-63234-2_24 ◽

2021 ◽

pp. 355-373

Author(s):

Geoffrey Alan Watson ◽

Kirsty Taylor ◽

Lillian L. Siu

Keyword(s):

Head And Neck Cancer ◽

Head And Neck ◽

Precision Medicine ◽

Neck Cancer ◽

Target Identification ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Liquid Biopsies ◽

Cell Therapies ◽

Personalized Cancer

AbstractThe clinical utility of precision medicine through molecular characterization of tumors has been demonstrated in some malignancies, especially in cases where oncogenic driver alterations are identified. Next generation sequencing data from thousands of patients with head and neck cancers have provided vast amounts of information about the genomic landscape of this disease. Thus far, only a limited number of genomic alterations have been druggable, such as NTRK gene rearrangements in salivary gland cancers (mainly mammary analogue secretory carcinoma), NOTCH mutations in adenoid cystic cancers, HRAS mutations in head and neck squamous cell cancers, and even a smaller number of these have reached regulatory approval status. In order to expand the scope of precision medicine in head and neck cancer, additional evaluation beyond genomics is necessary. For instance, there is increasing interest to perform transcriptomic profiling for target identification. Another advance is in the area of functional testing such as small interfering RNA and drug libraries on patient derived cell cultures. Liquid biopsies to detect specific tumor clones or subclones, or viral sequences such as HPV, are of great interest to enable non-invasive tracking of response or resistance to treatment. In addition, precision immuno-oncology is a tangible goal, with a growing body of knowledge on the interactions between the host immunity, the tumor and its microenvironment. Immuno-oncology combinations that are tailored to immunophenotypes of the host-tumor-microenvironment triad, personalized cancer vaccines, and adoptive cell therapies, among others, are in active development. Many therapeutic possibilities and opportunities lie ahead that ultimately will increase the reality of precision medicine in head and neck cancer.

Download Full-text