scholarly journals Power and pitfalls of computational methods for inferring clone phylogenies and mutation orders from bulk sequencing data

2019 ◽  
Author(s):  
Sayaka Miura ◽  
Tracy Vu ◽  
Jiamin Deng ◽  
Tiffany Buturla ◽  
Jiyeong Choi ◽  
...  

AbstractBackgroundTumors harbor extensive genetic heterogeneity in the form of distinct clone genotypes that arise over time and across different tissues and regions of a cancer patient. Many computational methods produce clone phylogenies from population bulk sequencing data collected from multiple tumor samples. These clone phylogenies are used to infer mutation order and clone origin times during tumor progression, rendering the selection of the appropriate clonal deconvolution method quite critical. Surprisingly, absolute and relative accuracies of these methods in correctly inferring clone phylogenies have not been consistently assessed.MethodsWe evaluated the performance of seven computational methods in producing clone phylogenies for simulated datasets in which clones were sampled from multiple sectors of a primary tumor (multi-region) or primary and metastatic tumors in a patient (multi-site). We assessed the accuracy of tested methods metrics in determining the order of mutations and the branching pattern within the reconstructed clone phylogenies.ResultsThe accuracy of the reconstructed mutation order varied extensively among methods (9% – 44% error). Methods also varied significantly in reconstructing the topologies of clone phylogenies, as 24% – 58% of the inferred clone groupings were incorrect. All the tested methods showed limited ability to identify ancestral clone sequences present in tumor samples correctly. The occurrence of multiple seeding events among tumor sites during metastatic tumor evolution hindered deconvolution of clones for all tested methods.ConclusionsOverall, CloneFinder, MACHINA, and LICHeE showed the highest overall accuracy, but none of the methods performed well for all simulated datasets and conditions.

2015 ◽  
Author(s):  
Andrea Sottoriva ◽  
Trevor Graham

Despite extraordinary efforts to profile cancer genomes on a large scale, interpreting the vast amount of genomic data in the light of cancer evolution and in a clinically relevant manner remains challenging. Here we demonstrate that cancer next-generation sequencing data is dominated by the signature of growth governed by a power-law distribution of mutant allele frequencies. The power-law signature is common to multiple tumor types and is a consequence of the effectively-neutral evolutionary dynamics that underpin the evolution of a large proportion of cancers, giving rise to the abundance of mutations responsible for intra-tumor heterogeneity. Importantly, the law allows the measurement, in each individual cancer, of the in vivo mutation rate and the timing of mutations with remarkable precision. This result provides a new way to interpret cancer genomic data by considering the physics of tumor growth in a way that is both patient-specific and clinically relevant.


2018 ◽  
Author(s):  
Simone Zaccaria ◽  
Benjamin J. Raphael

Copy-number aberrations (CNAs) and whole-genome duplications (WGDs) are frequent somatic mutations in cancer. Accurate quantification of these mutations from DNA sequencing of bulk tumor samples is complicated by varying tumor purity, admixture of multiple tumor clones with distinct mutations, and high aneuploidy. Standard methods for CNA inference analyze tumor samples individually, but recently DNA sequencing of multiple samples from a cancer patient - e.g. from multiple regions of a primary tumor, matched primary/metastases, or multiple time points - has become common. We introduce a new algorithm, Holistic Allele-specific Tumor Copy-number Heterogeneity (HATCHet), that infers allele and clone-specific CNAs and WGDs jointly across multiple tumor samples from the same patient, and that leverages the relationships between clones in these samples. HATCHet provides a fresh perspective on CNA inference and includes several algorithmic innovations that overcome the limitations of existing methods, resulting in a more robust approach even for single-sample analysis. We also develop MASCoTE (Multiple Allele-specific Simulation of Copy-number Tumor Evolution), a framework for generating realistic simulated multi-sample DNA sequencing data with appropriate corrections for the differences in genome lengths between the normal and tumor clone(s) present in mixed samples. HATCHet outperforms current state-of-the-art methods on 256 simulated tumor samples from 64 patients, half with WGD. HATCHet's analysis of 49 primary tumor and metastasis samples from 10 prostate cancer patients reveals subclonal CNAs in only 29 of these samples, compared to the published reports of extensive subclonal CNAs in all samples. HATCHet's inferred CNAs are also more consistent with the reports of polyclonal origin and limited heterogeneity of metastasis in a subset of patients. HATCHet's analysis of 35 primary tumor and metastasis samples from 4 pancreas cancer patients reveals subclonal CNAs in 20 samples, WGDs in 3 patients, and tumor subclones that are shared across primary and metastases samples from the same patient - none of which were described in published analysis of this data. HATCHet substantially improves the analysis of CNAs and WGDs, leading to more reliable studies of tumor evolution in primary tumors and metastases.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yuping Li ◽  
Xiaoju Liang ◽  
Xuguo Zhou ◽  
Yu An ◽  
Ming Li ◽  
...  

AbstractGlycyrrhiza, a genus of perennial medicinal herbs, has been traditionally used to treat human diseases, including respiratory disorders. Functional analysis of genes involved in the synthesis, accumulation, and degradation of bioactive compounds in these medicinal plants requires accurate measurement of their expression profiles. Reverse transcription quantitative real-time PCR (RT-qPCR) is a primary tool, which requires stably expressed reference genes to serve as the internal references to normalize the target gene expression. In this study, the stability of 14 candidate reference genes from the two congeneric species G. uralensis and G. inflata, including ACT, CAC, CYP, DNAJ, DREB, EF1, RAN, TIF1, TUB, UBC2, ABCC2, COPS3, CS, R3HDM2, were evaluated across different tissues and throughout various developmental stages. More importantly, we investigated the impact of interactions between tissue and developmental stage on the performance of candidate reference genes. Four algorithms, including geNorm, NormFinder, BestKeeper, and Delta Ct, were used to analyze the expression stability and RefFinder, a comprehensive software, provided the final recommendation. Based on previous research and our preliminary data, we hypothesized that internal references for spatio-temporal gene expression are different from the reference genes suited for individual factors. In G. uralensis, the top three most stable reference genes across different tissues were R3HDM2, CAC and TUB, while CAC, CYP and ABCC2 were most suited for different developmental stages. CAC is the only candidate recommended for both biotic factors, which is reflected in the stability ranking for the spatio (tissue)-temporal (developmental stage) interactions (CAC, R3HDM2 and DNAJ). Similarly, in G. inflata, COPS3, R3HDM2 and DREB were selected for tissues, while RAN, COPS3 and CS were recommended for developmental stages. For the tissue-developmental stage interactions, COPS3, DREB and ABCC2 were the most suited reference genes. In both species, only one of the top three candidates was shared between the individual factors and their interactions, specifically, CAC in G. uralensis and COPS3 in G. inflata, which supports our overarching hypothesis. In summary, spatio-temporal selection of reference genes not only lays the foundation for functional genomics research in Glycyrrhiza, but also facilitates these traditional medicinal herbs to reach/maximize their pharmaceutical potential.


2019 ◽  
Vol 102 (5) ◽  
pp. 1263-1270 ◽  
Author(s):  
Weili Xiong ◽  
Melinda A McFarland ◽  
Cary Pirone ◽  
Christine H Parker

Abstract Background: To effectively safeguard the food-allergic population and support compliance with food-labeling regulations, the food industry and regulatory agencies require reliable methods for food allergen detection and quantification. MS-based detection of food allergens relies on the systematic identification of robust and selective target peptide markers. The selection of proteotypic peptide markers, however, relies on the availability of high-quality protein sequence information, a bottleneck for the analysis of many plant-based proteomes. Method: In this work, data were compiled for reference tree nut ingredients and evaluated using a parsimony-driven global proteomics workflow. Results: The utility of supplementing existing incomplete protein sequence databases with translated genomic sequencing data was evaluated for English walnut and provided enhanced selection of candidate peptide markers and differentiation between closely related species. Highlights: Future improvements of protein databases and release of genomics-derived sequences are expected to facilitate the development of robust and harmonized LC–tandem MS-based methods for food allergen detection.


2008 ◽  
Vol 3 (3) ◽  
pp. 3
Author(s):  
Robert A. Wright

Objective – The aim of this article is to present evidence based methods for the selection of chemistry monographs, particularly for librarians lacking a background in chemistry. These methods will be described in detail, their practical application illustrated, and their efficacy tested by analyzing circulation data. Methods – Two hundred and ninety-five chemistry monographs were selected between 2005 and 2007 using rigorously-applied evidence based methods involving the Library's integrated library system (ILS), Google, and SciFinder Scholar. The average circulation rate of this group of monographs was compared to the average circulation rate of 254 chemistry monographs selected between 2002 and 2004 when the methods were not used or were in an incomplete state of development. Results – Circulations/month were on average 9% greater in the cohort of monographs selected with the rigorously-applied evidence based methods. Further statistical analysis, however, finds that this result can not be attributed to the different application of these methods. Conclusion – The methods discussed in this article appear to provide an evidence base for the selection of chemistry monographs, but their application does not change circulation rates in a statistically significant way. Further research is needed to determine if this lack of statistical significance is real or a product of the organic development and application of these methods over time, making definitive comparisons difficult.


Cancers ◽  
2021 ◽  
Vol 13 (18) ◽  
pp. 4560
Author(s):  
Jerome Griffon ◽  
Delphine Buffello ◽  
Alain Giron ◽  
S. Lori Bridal ◽  
Michele Lamuraglia

Purpose: There is a clinical need to better non-invasively characterize the tumor microenvironment in order to reveal evidence of early tumor response to therapy and to better understand therapeutic response. The goals of this work are first to compare the sensitivity to modifications occurring during tumor growth for measurements of tumor volume, immunohistochemistry parameters, and emerging ultrasound parameters (Shear Wave Elastography (SWE) and dynamic Contrast-Enhanced Ultrasound (CEUS)), and secondly, to study the link between the different parameters. Methods: Five different groups of 9 to 10 BALB/c female mice with subcutaneous CT26 tumors were imaged using B-mode morphological imaging, SWE, and CEUS at different dates. Whole-slice immunohistological data stained for the nuclei, T lymphocytes, apoptosis, and vascular endothelium from these tumors were analyzed. Results: Tumor volume and three CEUS parameters (Time to Peak, Wash-In Rate, and Wash-Out Rate) significantly changed over time. The immunohistological parameters, CEUS parameters, and SWE parameters showed intracorrelation. Four immunohistological parameters (the number of T lymphocytes per mm2 and its standard deviation, the percentage area of apoptosis, and the colocalization of apoptosis and vascular endothelium) were correlated with the CEUS parameters (Time to Peak, Wash-In Rate, Wash-Out Rate, and Mean Transit Time). The SWE parameters were not correlated with the CEUS parameters nor with the immunohistological parameters. Conclusions: US imaging can provide additional information on tumoral changes. This could help to better explore the effect of therapies on tumor evolution, by studying the evolution of the parameters over time and by studying their correlations.


2016 ◽  
Vol 1 (2) ◽  
pp. 128-140 ◽  
Author(s):  
Cristina Capineri

Drawing on John Agnew’s (1987) theoretical framework for the analysis of place (location, locale and sense of place) and on Doreen Massey’s (1991) interpretation of Kilburn High Road (London), the contribution develops an analysis of the notion of place in the case study of Kilburn High Road by comparing the semantics emerging from Doreen Massey’s interpretation of Kilburn High Road in the late Nineties with those from a selection of noisy and unstructured volunteered geographic information collected from Flickr photos and Tweets harvested in 2014–2015. The comparison shows how sense of place is dynamic and changing over time and explores Kilburn High Road through the categories of location, locale and sense of place derived from the qualitative analysis of VGI content and annotations. The contribution shows how VGI can contribute to discovering the unique relationship between people and place which takes the form given by Doreen Massey to Kilburn High Road and then moves on to the many forms given by people experiencing Kilburn High Road through a photo, a Tweet or a simple narrative. Finally, the paper suggests that the analysis of VGI content can contribute to detect the relevant features of street life, from infrastructure to citizens’ perceptions, which should be taken into account for a more human-centered approach in planning or service management.


2018 ◽  
Author(s):  
An-Shun Tai ◽  
Chien-Hua Peng ◽  
Shih-Chi Peng ◽  
Wen-Ping Hsieh

AbstractMultistage tumorigenesis is a dynamic process characterized by the accumulation of mutations. Thus, a tumor mass is composed of genetically divergent cell subclones. With the advancement of next-generation sequencing (NGS), mathematical models have been recently developed to decompose tumor subclonal architecture from a collective genome sequencing data. Most of the methods focused on single-nucleotide variants (SNVs). However, somatic copy number aberrations (CNAs) also play critical roles in carcinogenesis. Therefore, further modeling subclonal CNAs composition would hold the promise to improve the analysis of tumor heterogeneity and cancer evolution. To address this issue, we developed a two-way mixture Poisson model, named CloneDeMix for the deconvolution of read-depth information. It can infer the subclonal copy number, mutational cellular prevalence (MCP), subclone composition, and the order in which mutations occurred in the evolutionary hierarchy. The performance of CloneDeMix was systematically assessed in simulations. As a result, the accuracy of CNA inference was nearly 93% and the MCP was also accurately restored. Furthermore, we also demonstrated its applicability using head and neck cancer samples from TCGA. Our results inform about the extent of subclonal CNA diversity, and a group of candidate genes that probably initiate lymph node metastasis during tumor evolution was also discovered. Most importantly, these driver genes are located at 11q13.3 which is highly susceptible to copy number change in head and neck cancer genomes. This study successfully estimates subclonal CNAs and exhibit the evolutionary relationships of mutation events. By doing so, we can track tumor heterogeneity and identify crucial mutations during evolution process. Hence, it facilitates not only understanding the cancer development but finding potential therapeutic targets. Briefly, this framework has implications for improved modeling of tumor evolution and the importance of inclusion of subclonal CNAs.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Guilherme B. Neumann ◽  
Paula Korkuć ◽  
Danny Arends ◽  
Manuel J. Wolf ◽  
Katharina May ◽  
...  

Abstract Background German Black Pied cattle (DSN) are an endangered dual-purpose breed which was largely replaced by Holstein cattle due to their lower milk yield. DSN cattle are kept as a genetic reserve with a current herd size of around 2500 animals. The ability to track sequence variants specific to DSN could help to support the conservation of DSN’s genetic diversity and to provide avenues for genetic improvement. Results Whole-genome sequencing data of 304 DSN cattle were used to design a customized DSN200k SNP chip harboring 182,154 variants (173,569 SNPs and 8585 indels) based on ten selection categories. We included variants of interest to DSN such as DSN unique variants and variants from previous association studies in DSN, but also variants of general interest such as variants with predicted consequences of high, moderate, or low impact on the transcripts and SNPs from the Illumina BovineSNP50 BeadChip. Further, the selection of variants based on haplotype blocks ensured that the whole-genome was uniformly covered with an average variant distance of 14.4 kb on autosomes. Using 300 DSN and 162 animals from other cattle breeds including Holstein, endangered local cattle populations, and also a Bos indicus breed, performance of the SNP chip was evaluated. Altogether, 171,978 (94.31%) of the variants were successfully called in at least one of the analyzed breeds. In DSN, the number of successfully called variants was 166,563 (91.44%) while 156,684 (86.02%) were segregating at a minor allele frequency > 1%. The concordance rate between technical replicates was 99.83 ± 0.19%. Conclusion The DSN200k SNP chip was proved useful for DSN and other Bos taurus as well as one Bos indicus breed. It is suitable for genetic diversity management and marker-assisted selection of DSN animals. Moreover, variants that were segregating in other breeds can be used for the design of breed-specific customized SNP chips. This will be of great value in the application of conservation programs for endangered local populations in the future.


Sign in / Sign up

Export Citation Format

Share Document