scholarly journals Population Substructure Has Implications in Validating Next-Generation Cancer Genomics Studies with TCGA

2019 ◽  
Vol 20 (5) ◽  
pp. 1192 ◽  
Author(s):  
Marina Miller ◽  
Eric Devor ◽  
Erin Salinas ◽  
Andreea Newtson ◽  
Michael Goodheart ◽  
...  

In the era of large genetic and genomic datasets, it has become crucially important to validate results of individual studies using data from publicly available sources, such as The Cancer Genome Atlas (TCGA). However, how generalizable are results from either an independent or a large public dataset to the remainder of the population? The study presented here aims to answer that question. Utilizing next generation sequencing data from endometrial and ovarian cancer patients from both the University of Iowa and TCGA, genomic admixture of each population was analyzed using STRUCTURE and ADMIXTURE software. In our independent data set, one subpopulation was identified, whereas in TCGA 4–6 subpopulations were identified. Data presented here demonstrate how different the genetic substructures of the TCGA and University of Iowa populations are. Validation of genomic studies between two different population samples must be aware of, account for and be corrected for background genetic substructure.

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1144
Author(s):  
Nan Xiao ◽  
Soner Koc ◽  
David Roberson ◽  
Phillip Brooks ◽  
Manisha Ray ◽  
...  

The BioCompute Object (BCO) standard is an IEEE standard (IEEE 2791-2020) designed to facilitate the communication of next-generation sequencing data analysis with applications across academia, government agencies, and industry. For example, the Food and Drug Administration (FDA) supports the standard for regulatory submissions and includes the standard in their Data Standards Catalog for the submission of HTS data. We created the BCO App to facilitate BCO generation in a range of computational environments and, in part, to participate in the Advanced Track of the precisionFDA BioCompute Object App-a-thon. The application facilitates the generation of BCOs from both workflow metadata provided as plaintext and from workflow contents written in the Common Workflow Language. The application can also access and ingest task execution results from the Cancer Genomics Cloud (CGC), an NCI funded computational platform. Creating a BCO from a CGC task significantly reduces the time required to generate a BCO on the CGC by auto-populating workflow information fields from CGC workflow and task execution results. The BCO App supports exporting BCOs as JSON or PDF files and publishing BCOs to both the CGC platform and to GitHub repositories.


2019 ◽  
Vol 11 (1) ◽  
Author(s):  
Chi Zhou ◽  
Zhiting Wei ◽  
Zhanbing Zhang ◽  
Biyu Zhang ◽  
Chenyu Zhu ◽  
...  

Abstract Background Cancer neoantigens are expressed only in cancer cells and presented on the tumor cell surface in complex with major histocompatibility complex (MHC) class I proteins for recognition by cytotoxic T cells. Accurate and rapid identification of neoantigens play a pivotal role in cancer immunotherapy. Although several in silico tools for neoantigen prediction have been presented, limitations of these tools exist. Results We developed pTuneos, a computational pipeline for prioritizing tumor neoantigens from next-generation sequencing data. We tested the performance of pTuneos on the melanoma cancer vaccine cohort data and tumor-infiltrating lymphocyte (TIL)-recognized neopeptide data. pTuneos is able to predict the MHC presentation and T cell recognition ability of the candidate neoantigens, and the actual immunogenicity of single-nucleotide variant (SNV)-based neopeptides considering their natural processing and presentation, surpassing the existing tools with a comprehensive and quantitative benchmark of their neoantigen prioritization performance and running time. pTuneos was further tested on The Cancer Genome Atlas (TCGA) cohort data as well as the melanoma and non-small cell lung cancer (NSCLC) cohort data undergoing checkpoint blockade immunotherapy. The overall neoantigen immunogenicity score proposed by pTuneos is demonstrated to be a powerful and pan-cancer marker for survival prediction compared to traditional well-established biomarkers. Conclusions In summary, pTuneos provides the state-of-the-art one-stop and user-friendly solution for prioritizing SNV-based candidate neoepitopes, which could help to advance research on next-generation cancer immunotherapies and personalized cancer vaccines. pTuneos is available at https://github.com/bm2-lab/pTuneos, with a Docker version for quick deployment at https://cloud.docker.com/u/bm2lab/repository/docker/bm2lab/ptuneos.


2018 ◽  
Vol 17 ◽  
pp. 117693511877478 ◽  
Author(s):  
Jovan Cejovic ◽  
Jelena Radenkovic ◽  
Vladimir Mladenovic ◽  
Adam Stanojevic ◽  
Milica Miletic ◽  
...  

Increased efforts in cancer genomics research and bioinformatics are producing tremendous amounts of data. These data are diverse in origin, format, and content. As the amount of available sequencing data increase, technologies that make them discoverable and usable are critically needed. In response, we have developed a Semantic Web–based Data Browser, a tool allowing users to visually build and execute ontology-driven queries. This approach simplifies access to available data and improves the process of using them in analyses on the Seven Bridges Cancer Genomics Cloud (CGC; www.cancergenomicscloud.org ). The Data Browser makes large data sets easily explorable and simplifies the retrieval of specific data of interest. Although initially implemented on top of The Cancer Genome Atlas (TCGA) data set, the Data Browser’s architecture allows for seamless integration of other data sets. By deploying it on the CGC, we have enabled remote researchers to access data and perform collaborative investigations.


2019 ◽  
Author(s):  
Tingting Gong ◽  
Vanessa M Hayes ◽  
Eva KF Chan

AbstractSomatic structural variants (SVs) play a significant role in cancer development and evolution, but are notoriously more difficult to detect than small variants from short-read next-generation sequencing (NGS) data. This is due to a combination of challenges attributed to the purity of tumour samples, tumour heterogeneity, limitations of short-read information from NGS, and sequence alignment ambiguities. In spite of active development of SV detection tools (callers) over the past few years, each method has inherent advantages and limitations. In this review, we highlight some of the important factors affecting somatic SV detection and compared the performance of eight commonly used SV callers. In particular, we focus on the extent of change in sensitivity and precision for detecting different SV types and size ranges from samples with differing variant allele frequencies and sequencing depths of coverage. We highlight the reasons for why some SV callers perform well in some settings but not others, allowing our evaluation findings to be extended beyond the eight SV callers examined in this paper. As the importance of large structural variants become increasingly recognised in cancer genomics, this paper provides a timely review on some of the most impactful factors influencing somatic SV detection and guidance on selecting an appropriate SV caller.


2019 ◽  
Vol 2 (4) ◽  
pp. e201900336 ◽  
Author(s):  
Eric Olivier Audemard ◽  
Patrick Gendron ◽  
Albert Feghaly ◽  
Vincent-Philippe Lavallée ◽  
Josée Hébert ◽  
...  

Mutations identified in acute myeloid leukemia patients are useful for prognosis and for selecting targeted therapies. Detection of such mutations using next-generation sequencing data requires a computationally intensive read mapping step followed by several variant calling methods. Targeted mutation identification drastically shifts the usual tradeoff between accuracy and performance by concentrating all computations over a small portion of sequence space. Here, we present km, an efficient approach leveraging k-mer decomposition of reads to identify targeted mutations. Our approach is versatile, as it can detect single-base mutations, several types of insertions and deletions, as well as fusions. We used two independent cohorts (The Cancer Genome Atlas and Leucegene) to show that mutation detection by km is fast, accurate, and mainly limited by sequencing depth. Therefore, km allows the establishment of fast diagnostics from next-generation sequencing data and could be suitable for clinical applications.


Author(s):  
Tingting Gong ◽  
Vanessa M Hayes ◽  
Eva K F Chan

Abstract Somatic structural variants (SVs), which are variants that typically impact >50 nucleotides, play a significant role in cancer development and evolution but are notoriously more difficult to detect than small variants from short-read next-generation sequencing (NGS) data. This is due to a combination of challenges attributed to the purity of tumour samples, tumour heterogeneity, limitations of short-read information from NGS and sequence alignment ambiguities. In spite of active development of SV detection tools (callers) over the past few years, each method has inherent advantages and limitations. In this review, we highlight some of the important factors affecting somatic SV detection and compared the performance of seven commonly used SV callers. In particular, we focus on the extent of change in sensitivity and precision for detecting different SV types and size ranges from samples with differing variant allele frequencies and sequencing depths of coverage. We highlight the reasons for why some SV callers perform well in some settings but not others, allowing our evaluation findings to be extended beyond the seven SV callers examined in this paper. As the importance of large SVs become increasingly recognized in cancer genomics, this paper provides a timely review on some of the most impactful factors influencing somatic SV detection that should be considered when choosing SV callers.


Biostatistics ◽  
2018 ◽  
Vol 21 (3) ◽  
pp. 577-593
Author(s):  
Sixing Chen ◽  
Xihong Lin

Summary With the advent of next-generation sequencing, investigators have access to higher quality sequencing data. However, to sequence all samples in a study using next generation sequencing can still be prohibitively expensive. One potential remedy could be to combine next generation sequencing data from cases with publicly available sequencing data for controls, but there could be a systematic difference in quality of sequenced data, such as sequencing depths, between sequenced study cases and publicly available controls. We propose a regression calibration (RC)-based method and a maximum-likelihood method for conducting an association study with such a combined sample by accounting for differential sequencing errors between cases and controls. The methods allow for adjusting for covariates, such as population stratification as confounders. Both methods control type I error and have comparable power to analysis conducted using the true genotype with sufficiently high but different sequencing depths. We show that the RC method allows for analysis using naive variance estimate (closely approximates true variance in practice) and standard software under certain circumstances. We evaluate the performance of the proposed methods using simulation studies and apply our methods to a combined data set of exome sequenced acute lung injury cases and healthy controls from the 1000 Genomes project.


Sign in / Sign up

Export Citation Format

Share Document