A pan-cancer signature of neutral tumor evolution

Mapping Intimacies ◽

10.1101/014894 ◽

2015 ◽

Cited By ~ 2

Author(s):

Andrea Sottoriva ◽

Trevor Graham

Keyword(s):

Power Law ◽

Large Scale ◽

Evolutionary Dynamics ◽

Genomic Data ◽

Next Generation Sequencing Data ◽

Patient Specific ◽

Tumor Evolution ◽

Sequencing Data ◽

Power Law Distribution ◽

Multiple Tumor

Despite extraordinary efforts to profile cancer genomes on a large scale, interpreting the vast amount of genomic data in the light of cancer evolution and in a clinically relevant manner remains challenging. Here we demonstrate that cancer next-generation sequencing data is dominated by the signature of growth governed by a power-law distribution of mutant allele frequencies. The power-law signature is common to multiple tumor types and is a consequence of the effectively-neutral evolutionary dynamics that underpin the evolution of a large proportion of cancers, giving rise to the abundance of mutations responsible for intra-tumor heterogeneity. Importantly, the law allows the measurement, in each individual cancer, of the in vivo mutation rate and the timing of mutations with remarkable precision. This result provides a new way to interpret cancer genomic data by considering the physics of tumor growth in a way that is both patient-specific and clinically relevant.

Download Full-text

rmvPFBAM: Removing Primers from BAM Files Based on Amplicon-Based Next-Generation Sequencing and Cloud Computing When Analyzing Personal Genome Data

Scientific Programming ◽

10.1155/2021/6536470 ◽

2021 ◽

Vol 2021 ◽

pp. 1-6

Author(s):

Yanjun Ma

Keyword(s):

Next Generation Sequencing ◽

False Positive ◽

Large Scale ◽

Genomic Data ◽

Next Generation Sequencing Data ◽

Personal Genome ◽

Next Generation ◽

Sequencing Data ◽

Personal Genomic ◽

Generation Sequencing

Personal genomic data constitute one important part of personal health data. However, due to the large amount of personal genomic data obtained by the next-generation sequencing technology, special tools are needed to analyze these data. In this article, we will explore a tool analyzing cloud-based large-scale genome sequencing data. Analyzing and identifying genomic variations from amplicon-based next-generation sequencing data are necessary for the clinical diagnosis and treatment of cancer patients. When processing the amplicon-based next-generation sequencing data, one essential step is removing primer sequences from the reads to avoid detecting false-positive mutations introduced by nonspecific primer binding and primer extension reactions. At present, the removing primer tools usually discard primer sequences from the FASTQ file instead of BAM file, but this method could cause some downstream analysis problems. Only one tool (BAMClipper) removes primer sequences from BAM files, but it only modified the CIGAR value of the BAM file, and false-positive mutations falling in the primer region could still be detected based on its processed BAM file. So, we developed one cutting primer tool (rmvPFBAM) removing primer sequences from the BAM file, and the mutations detected based on the processed BAM file by rmvPFBAM are highly credible. Besides that, rmvPFBAM runs faster than other tools, such as cutPrimers and BAMClipper.

Download Full-text

Compression of Next-Generation Sequencing Data and of DNA Digital Files

Algorithms ◽

10.3390/a13060151 ◽

2020 ◽

Vol 13 (6) ◽

pp. 151

Author(s):

Bruno Carpentieri

Keyword(s):

Next Generation Sequencing ◽

Dna Sequences ◽

Network Traffic ◽

Large Scale ◽

Genomic Data ◽

Biological Data ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

The increase in memory and in network traffic used and caused by new sequenced biological data has recently deeply grown. Genomic projects such as HapMap and 1000 Genomes have contributed to the very large rise of databases and network traffic related to genomic data and to the development of new efficient technologies. The large-scale sequencing of samples of DNA has brought new attention and produced new research, and thus the interest in the scientific community for genomic data has greatly increased. In a very short time, researchers have developed hardware tools, analysis software, algorithms, private databases, and infrastructures to support the research in genomics. In this paper, we analyze different approaches for compressing digital files generated by Next-Generation Sequencing tools containing nucleotide sequences, and we discuss and evaluate the compression performance of generic compression algorithms by confronting them with a specific system designed by Jones et al. specifically for genomic file compression: Quip. Moreover, we present a simple but effective technique for the compression of DNA sequences in which we only consider the relevant DNA data and experimentally evaluate its performances.

Download Full-text

Spatially constrained tumour growth affects the patterns of clonal selection and neutral drift in cancer genomic data

10.1101/544536 ◽

2019 ◽

Cited By ~ 3

Author(s):

Kate Chkhaidze ◽

Timon Heide ◽

Benjamin Werner ◽

Marc J. Williams ◽

Weini Huang ◽

...

Keyword(s):

Next Generation Sequencing ◽

Tumour Growth ◽

Evolutionary Dynamics ◽

Clonal Selection ◽

Genomic Data ◽

Confounding Factors ◽

Data Generation ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

AbstractQuantification of the effect of spatial tumour sampling on the patterns of mutations detected in next-generation sequencing data is largely lacking. Here we use a spatial stochastic cellular automaton model of tumour growth that accounts for somatic mutations, selection, drift and spatial constrains, to simulate multi-region sequencing data derived from spatial sampling of a neoplasm. We show that the spatial structure of a solid cancer has a major impact on the detection of clonal selection and genetic drift from bulk sequencing data and single-cell sequencing data. Our results indicate that spatial constrains can introduce significant sampling biases when performing multi-region bulk sampling and that such bias becomes a major confounding factor for the measurement of the evolutionary dynamics of human tumours. We present a statistical inference framework that takes into account the spatial effects of a growing tumour and allows inferring the evolutionary dynamics from patient genomic data. Our analysis shows that measuring cancer evolution using next-generation sequencing while accounting for the numerous confounding factors requires a mechanistic model-based approach that captures the sources of noise in the data.SummarySequencing the DNA of cancer cells from human tumours has become one of the main tools to study cancer biology. However, sequencing data are complex and often difficult to interpret. In particular, the way in which the tissue is sampled and the data are collected, impact the interpretation of the results significantly. We argue that understanding cancer genomic data requires mathematical models and computer simulations that tell us what we expect the data to look like, with the aim of understanding the impact of confounding factors and biases in the data generation step. In this study, we develop a spatial simulation of tumour growth that also simulates the data generation process, and demonstrate that biases in the sampling step and current technological limitations severely impact the interpretation of the results. We then provide a statistical framework that can be used to overcome these biases and more robustly measure aspects of the biology of tumours from the data.

Download Full-text

Opposing Evolutionary Pressures Drive Clonal Evolution and Health Outcomes in the Aging Blood System

Blood ◽

10.1182/blood-2020-142086 ◽

2020 ◽

Vol 136 (Supplement 1) ◽

pp. 37-37

Author(s):

Kimberly Skead ◽

Armande Ang Houle ◽

Sagi Abelson ◽

Marie-Julie Fave ◽

Boxi Lin ◽

...

Keyword(s):

Gene Disruption ◽

Large Scale ◽

Evolutionary Dynamics ◽

Clonal Evolution ◽

Neutral Evolution ◽

Driver Mutations ◽

Cancer Evolution ◽

Sequencing Data ◽

High Coverage ◽

Passenger Mutations

The age-associated accumulation of somatic mutations and large-scale structural variants (SVs) in the early hematopoietic hierarchy have been linked to premalignant stages for cancer and cardiovascular disease (CVD). However, only a small proportion of individuals harboring these mutations progress to disease, and mechanisms driving the transformation to malignancy remains unclear. Hematopoietic evolution, and cancer evolution more broadly, has largely been studied through a lens of adaptive evolution and the contribution of functionally neutral or mildly damaging mutations to early disease-associated clonal expansions has not been well characterised despite comprising the majority of the mutational burden in healthy or tumoural tissues. Through combining deep learning with population genetics, we interrogate the hematopoietic system to capture signatures of selection acting in healthy and pre-cancerous blood populations. Here, we leverage high-coverage sequencing data from healthy and pre-cancerous individuals from the European Prospective Investigation into Cancer and Nutrition Study (n=477) and dense genotyping from the Canadian Partnership for Tomorrow's Health (n=5,000) to show that blood rejects the paradigm of strictly adaptive or neutral evolution and is subject to pervasive negative selection. We observe clear age associations across hematopoietic populations and the dominant class of selection driving evolutionary dynamics acting at an individual level. We find that both the location and ratio of passenger to driver mutations are critical in determining if positive selection acting on driver mutations is able to overwhelm regulated hematopoiesis and allow clones harbouring disease-predisposing mutations to rise to dominance. Certain genes are enriched for passenger mutations in healthy individuals fitting purifying models of evolution, suggesting that the presence of passenger mutations in a subset of genes might confer a protective role against disease-predisposing clonal expansions. Finally, we find that the density of gene disruption events with known pathogenic associations in somatic SVs impacts the frequency at which the SV segregates in the population with variants displaying higher gene disruption density segregating at lower frequencies. Understanding how blood evolves towards malignancy will allow us to capture cancer in its earliest stages and identify events initiating departures from healthy blood evolution. Further, as the majority of mutations are passengers, studying their contribution to tumorigenesis, will unveil novel therapeutic targets thus enabling us to better understand patterns of clonal evolution in order to diagnose and treat disease in its infancy. Disclosures Dick: Bristol-Myers Squibb/Celgene: Research Funding.

Download Full-text

Speeding Up Large-Scale Next Generation Sequencing Data Analysis with pBWA

Journal of Applied Bioinformatics & Computational Biology ◽

10.4172/2329-9533.1000101 ◽

2017 ◽

Vol 01 (01) ◽

Cited By ~ 4

Author(s):

Darren Peters ◽

Xuemei Luo ◽

Ke Qiu ◽

Ping Liang

Keyword(s):

Data Analysis ◽

Next Generation Sequencing ◽

Large Scale ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing ◽

Sequencing Data Analysis

Download Full-text

InteractomeSeq: a web server for the identification and profiling of domains and epitopes from phage display and next generation sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkaa363 ◽

2020 ◽

Vol 48 (W1) ◽

pp. W200-W207

Author(s):

Simone Puccio ◽

Giorgio Grillo ◽

Arianna Consiglio ◽

Maria Felicia Soluri ◽

Daniele Sblattero ◽

...

Keyword(s):

Phage Display ◽

Large Scale ◽

High Throughput Sequencing ◽

Gene Annotation ◽

Web Server ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Phage Display Technology ◽

Essential Information ◽

Research Fields

Abstract High-Throughput Sequencing technologies are transforming many research fields, including the analysis of phage display libraries. The phage display technology coupled with deep sequencing was introduced more than a decade ago and holds the potential to circumvent the traditional laborious picking and testing of individual phage rescued clones. However, from a bioinformatics point of view, the analysis of this kind of data was always performed by adapting tools designed for other purposes, thus not considering the noise background typical of the ‘interactome sequencing’ approach and the heterogeneity of the data. InteractomeSeq is a web server allowing data analysis of protein domains (‘domainome’) or epitopes (‘epitome’) from either Eukaryotic or Prokaryotic genomic phage libraries generated and selected by following an Interactome sequencing approach. InteractomeSeq allows users to upload raw sequencing data and to obtain an accurate characterization of domainome/epitome profiles after setting the parameters required to tune the analysis. The release of this tool is relevant for the scientific and clinical community, because InteractomeSeq will fill an existing gap in the field of large-scale biomarkers profiling, reverse vaccinology, and structural/functional studies, thus contributing essential information for gene annotation or antigen identification. InteractomeSeq is freely available at https://InteractomeSeq.ba.itb.cnr.it/

Download Full-text

ASCOT identifies key regulators of neuronal subtype-specific splicing

Nature Communications ◽

10.1038/s41467-019-14020-5 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 2

Author(s):

Jonathan P. Ling ◽

Christopher Wilks ◽

Rone Charles ◽

Patrick J. Leavey ◽

Devlina Ghosh ◽

...

Keyword(s):

Rna Splicing ◽

Large Scale ◽

Splice Variants ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Cell Type ◽

Sequencing Data ◽

Large Scale Analysis ◽

Cell Type Specific ◽

Public Archives

AbstractPublic archives of next-generation sequencing data are growing exponentially, but the difficulty of marshaling this data has led to its underutilization by scientists. Here, we present ASCOT, a resource that uses annotation-free methods to rapidly analyze and visualize splice variants across tens of thousands of bulk and single-cell data sets in the public archive. To demonstrate the utility of ASCOT, we identify novel cell type-specific alternative exons across the nervous system and leverage ENCODE and GTEx data sets to study the unique splicing of photoreceptors. We find that PTBP1 knockdown and MSI1 and PCBP2 overexpression are sufficient to activate many photoreceptor-specific exons in HepG2 liver cancer cells. This work demonstrates how large-scale analysis of public RNA-Seq data sets can yield key insights into cell type-specific control of RNA splicing and underscores the importance of considering both annotated and unannotated splicing events.

Download Full-text

Phase Synchronization Stability of Non-Homogeneous Low-Voltage Distribution Networks with Large-Scale Distributed Generations

Energies ◽

10.3390/en13051257 ◽

2020 ◽

Vol 13 (5) ◽

pp. 1257 ◽

Cited By ~ 1

Author(s):

Shi Chen ◽

Hong Zhou ◽

Jingang Lai ◽

Yiwei Zhou ◽

Chang Yu

Keyword(s):

Power Law ◽

Coupling Strength ◽

Large Scale ◽

Phase Synchronization ◽

Distribution Networks ◽

Critical Coupling ◽

Power Law Distribution ◽

Distributed Generations ◽

Strength Formula ◽

The Impact

The ideal distributed network composed of distributed generations (DGs) has unweighted and undirected interactions which omit the impact of the power grid structure and actual demand. Apparently, the coupling relationship between DGs, which is determined by line impedance, node voltage, and droop coefficient, is generally non-homogeneous. Motivated by this, this paper investigates the phase synchronization of an islanded network with large-scale DGs in a non-homogeneous condition. Furthermore, we explicitly deduce the critical coupling strength formula for different weighting cases via the synchronization condition. On this basis, three cases of Gaussian distribution, power-law distribution, and frequency-weighted distribution are analyzed. A synthetical analysis is also presented, which helps to identify the order parameter. Finally, this paper employs the numerical simulation methods to test the effectiveness of the critical coupling strength formula and the superiority over the power-law distribution.

Download Full-text

Online Reviews' Trustworthiness Analysis Based on Power-Law Distribution

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.838-841.3260 ◽

2013 ◽

Vol 838-841 ◽

pp. 3260-3267

Author(s):

Qi Chang Yao ◽

Xin Feng ◽

Qi Ming Sun

Keyword(s):

Power Law ◽

Attitude Change ◽

Large Scale ◽

Online Shopping ◽

Online Reviews ◽

Subjective Perception ◽

Power Exponent ◽

Power Law Distribution ◽

Consumer Reviews ◽

Consumer Attention

In online shopping, studies on consumer reviews are mostly based on the Attitude Change Model. Illustrated from the perspective of perceived trustworthiness, however, it is not easy to measure and characterize the subjective perception of consumers. Starting from the inherent property of online reviews and based on the real data of 360buy which is the domestic large-scale B2C commerce website in China, this paper focuses on the interval distribution of consumer reviews and the data for statistical analysis. Research finds that the distribution of reviews interval can be depicted by the power-law function and there is a monotonically increasing relationship between power-exponent and the customers concerns with the commodity, the higher exponent, the attention consumer get. The findings give an objective basis to judge the credibility of online reviews. The relationship between power-exponent and the consumer attention has demonstrated the vital role of consumer attention in online shopping, and then the double parity between attention and exponent will effectively regulate the e-commerce market environment and promote its healthy operation. Tech Publications.

Download Full-text

WBFQC: A new approach for compressing next-generation sequencing data splitting into homogeneous streams

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001850018x ◽

2018 ◽

Vol 16 (05) ◽

pp. 1850018 ◽

Cited By ~ 1

Author(s):

Sanjeev Kumar ◽

Suneeta Agarwal ◽

Ranvijay

Keyword(s):

Next Generation Sequencing ◽

Genomic Data ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Compression Technique ◽

Compression Algorithms ◽

Ngs Data ◽

And Storage ◽

Generation Sequencing

Genomic data nowadays is playing a vital role in number of fields such as personalized medicine, forensic, drug discovery, sequence alignment and agriculture, etc. With the advancements and reduction in the cost of next-generation sequencing (NGS) technology, these data are growing exponentially. NGS data are being generated more rapidly than they could be significantly analyzed. Thus, there is much scope for developing novel data compression algorithms to facilitate data analysis along with data transfer and storage directly. An innovative compression technique is proposed here to address the problem of transmission and storage of large NGS data. This paper presents a lossless non-reference-based FastQ file compression approach, segregating the data into three different streams and then applying appropriate and efficient compression algorithms on each. Experiments show that the proposed approach (WBFQC) outperforms other state-of-the-art approaches for compressing NGS data in terms of compression ratio (CR), and compression and decompression time. It also has random access capability over compressed genomic data. An open source FastQ compression tool is also provided here ( http://www.algorithm-skg.com/wbfqc/home.html ).

Download Full-text