TCGA-Assembler 2: Software Pipeline for Retrieval and Processing of TCGA/CPTAC Data

Mapping Intimacies ◽

10.1101/214320 ◽

2017 ◽

Cited By ~ 2

Author(s):

Lin Wei ◽

Zhilin Jin ◽

Shengjie Yang ◽

Yanxun Xu ◽

Yitan Zhu ◽

...

Keyword(s):

Data Storage ◽

Cancer Genomics ◽

Substantial Improvement ◽

The Cancer Genome Atlas ◽

Reproducible Research ◽

Proteomics Data ◽

Software Pipeline ◽

Storage And Retrieval ◽

Cancer Genome Atlas ◽

Integrate Data

AbstractMotivationThe Cancer Genome Atlas (TCGA) program has produced huge amounts of cancer genomics data providing unprecedented opportunities for research. In 2014, we developed TCGA-Assembler (Zhu et al, 2014), a software pipeline for retrieval and processing of public TCGA data. In 2016, TCGA data were transferred from the TCGA data portal to the Genomic Data Commons (GDC), which is supported by a different set of data storage and retrieval mechanisms. In addition, new proteomics data of TCGA samples have been generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) program, which were not available for downloading through TCGA-Assembler. It is desirable to acquire and integrate data from both GDC and CPTAC.ResultsWe develop TCGA-Assembler 2 (TA2) to automatically download and integrate data from GDC and CPTAC. We make substantial improvement on the functionality of TA2 to enhance user experience and software performance. TA2 together with its previous version have helped more than 2,000 researchers from 64 countries to access and utilize TCGA and CPTAC data in their research. Availability of TA2 will continue to allow existing and new users to conduct reproducible research based on TCGA and CPTAC data.Availabilityhttp://www.compgenome.org/TCGA-Assembler/[email protected] or [email protected]

Download Full-text

Extending TCGA queries to automatically identify analogous genomic data from dbGaP

F1000Research ◽

10.12688/f1000research.9837.1 ◽

2017 ◽

Vol 6 ◽

pp. 319

Author(s):

Erin K. Wagner ◽

Satyajeet Raje ◽

Liz Amos ◽

Jessica Kurata ◽

Abhijit S. Badve ◽

...

Keyword(s):

Genomic Data ◽

The Cancer Genome Atlas ◽

Genomic Research ◽

Reproducible Research ◽

Software Pipeline ◽

Individual Level ◽

Related Data ◽

Cancer Genome Atlas ◽

Existing Data ◽

Genome Atlas

Data sharing is critical to advance genomic research by reducing the demand to collect new data by reusing and combining existing data and by promoting reproducible research. The Cancer Genome Atlas (TCGA) is a popular resource for individual-level genotype-phenotype cancer related data. The Database of Genotypes and Phenotypes (dbGaP) contains many datasets similar to those in TCGA. We have created a software pipeline that will allow researchers to discover relevant genomic data from dbGaP, based on matching TCGA metadata. The resulting research provides an easy to use tool to connect these two data sources.

Download Full-text

Molecular Insights into the Classification of Luminal Breast Cancers: The Genomic Heterogeneity of Progesterone-Negative Tumors

International Journal of Molecular Sciences ◽

10.3390/ijms20030510 ◽

2019 ◽

Vol 20 (3) ◽

pp. 510 ◽

Cited By ~ 6

Author(s):

Gianluca Lopez ◽

Jole Costanza ◽

Matteo Colleoni ◽

Laura Fontana ◽

Stefano Ferrero ◽

...

Keyword(s):

Cancer Genomics ◽

Molecular Taxonomy ◽

The Cancer Genome Atlas ◽

Rank Test ◽

Breast Cancers ◽

Cancer Genes ◽

Kaplan Meier ◽

Er Positive ◽

Cancer Genome Atlas ◽

Student’S T

Estrogen receptor (ER)-positive progesterone receptor (PR)-negative breast cancers are infrequent but clinically challenging. Despite the volume of genomic data available on these tumors, their biology remains poorly understood. Here, we aimed to identify clinically relevant subclasses of ER+/PR− breast cancers based on their mutational landscape. The Cancer Genomics Data Server was interrogated for mutational and clinical data of all ER+ breast cancers with information on PR status from The Cancer Genome Atlas (TCGA), Memorial Sloan Kettering (MSK), and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) projects. Clustering analysis was performed using gplots, ggplot2, and ComplexHeatmap packages. Comparisons between groups were performed using the Student’s t-test and the test of Equal or Given Proportions. Survival curves were built according to the Kaplan–Meier method; differences in survival were assessed with the log-rank test. A total of 3570 ER+ breast cancers (PR− n = 959, 27%; PR+ n = 2611, 73%) were analyzed. Mutations in well-known cancer genes such as TP53, GATA3, CDH1, HER2, CDH1, and BRAF were private to or enriched for in PR− tumors. Mutual exclusivity analysis revealed the presence of four molecular clusters with significantly different prognosis on the basis of PIK3CA and TP53 status. ER+/PR− breast cancers are genetically heterogeneous and encompass a variety of distinct entities in terms of prognostic and predictive information.

Download Full-text

Using Semantic Web Technologies to Enable Cancer Genomics Discovery at Petabyte Scale

Cancer Informatics ◽

10.1177/1176935118774787 ◽

2018 ◽

Vol 17 ◽

pp. 117693511877478 ◽

Cited By ~ 2

Author(s):

Jovan Cejovic ◽

Jelena Radenkovic ◽

Vladimir Mladenovic ◽

Adam Stanojevic ◽

Milica Miletic ◽

...

Keyword(s):

Semantic Web ◽

Cancer Genomics ◽

Large Data ◽

The Cancer Genome Atlas ◽

Data Sets ◽

Sequencing Data ◽

Semantic Web Technologies ◽

Data Set ◽

Seamless Integration ◽

Cancer Genome Atlas

Increased efforts in cancer genomics research and bioinformatics are producing tremendous amounts of data. These data are diverse in origin, format, and content. As the amount of available sequencing data increase, technologies that make them discoverable and usable are critically needed. In response, we have developed a Semantic Web–based Data Browser, a tool allowing users to visually build and execute ontology-driven queries. This approach simplifies access to available data and improves the process of using them in analyses on the Seven Bridges Cancer Genomics Cloud (CGC; www.cancergenomicscloud.org ). The Data Browser makes large data sets easily explorable and simplifies the retrieval of specific data of interest. Although initially implemented on top of The Cancer Genome Atlas (TCGA) data set, the Data Browser’s architecture allows for seamless integration of other data sets. By deploying it on the CGC, we have enabled remote researchers to access data and perform collaborative investigations.

Download Full-text

Ordino: a visual cancer analysis tool for ranking and exploring genes, cell lines and tissue samples

Bioinformatics ◽

10.1093/bioinformatics/btz009 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3140-3142 ◽

Cited By ~ 7

Author(s):

Marc Streit ◽

Samuel Gratzl ◽

Holger Stitz ◽

Andreas Wernitznig ◽

Thomas Zichner ◽

...

Keyword(s):

Cell Lines ◽

Cancer Genomics ◽

Cancer Cell Line ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Analysis Tool ◽

Tissue Samples ◽

Web Based ◽

Prioritization Process ◽

Cancer Genome Atlas

Abstract Summary Ordino is a web-based analysis tool for cancer genomics that allows users to flexibly rank, filter and explore genes, cell lines and tissue samples based on pre-loaded data, including The Cancer Genome Atlas, the Cancer Cell Line Encyclopedia and manually uploaded information. Interactive tabular data visualization that facilitates the user-driven prioritization process forms a core component of Ordino. Detail views of selected items complement the exploration. Findings can be stored, shared and reproduced via the integrated session management. Availability and implementation Ordino is publicly available at https://ordino.caleydoapp.org. The source code is released at https://github.com/Caleydo/ordino under the Mozilla Public License 2.0. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Abstract 2348: Low-cost and accurate human leukocyte antigen (HLA) class I typing of The Cancer Genome Atlas on the Seven Bridges Cancer Genomics Cloud

10.1158/1538-7445.am2018-2348 ◽

2018 ◽

Author(s):

Raunaq Malhotra ◽

Alexandar Krasnitz ◽

Anurag Sethi ◽

Erik Lehnert ◽

Elizabeth H. Williams ◽

...

Keyword(s):

Human Leukocyte Antigen ◽

Cancer Genomics ◽

Low Cost ◽

Human Leukocyte ◽

Hla Class I ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Leukocyte Antigen ◽

Cancer Genome Atlas ◽

Genome Atlas

Download Full-text

Bladder cancer genomics

Urologia Journal ◽

10.1177/0391560319899011 ◽

2020 ◽

Vol 87 (2) ◽

pp. 49-56

Author(s):

Salvatore Siracusano ◽

Riccardo Rizzetto ◽

Antonio Benito Porcaro

Keyword(s):

Bladder Cancer ◽

Cancer Genomics ◽

Treatment Strategies ◽

Genetic Alterations ◽

Point Of View ◽

The Cancer Genome Atlas ◽

Molecular Alterations ◽

Cancer Genome Atlas ◽

Polymerase Chain ◽

Muscle Invasive

Until recently, the treatment of bladder cancer, for several years, was limited to surgery and to immunotherapy or chemotherapy. Currently, the extensive analysis of molecular alterations has led to novel treatment approaches. The advent of polymerase chain reaction and genomic hybridization techniques has allowed to investigate alterations involved in bladder cancer at DNA level. By this way, bladder cancers can be classified as papillary or non-papillary based on genetic alterations with activation or mutations in FGFR3 papillary tumors and with inactivation or mutations involving TP53 and RB1 in non-papillary tumors. Recently, the patterns of gene expression allow to differentiate basal and luminal subtypes as reported in breast cancer. In particular, basal cancers are composed of squamous and sarcomatoid pathological findings, while luminal cancers are composed of papillary finding features and genetic mutations (FGFR3). In particular, specific investigative studies demonstrated that luminal cancers are associated with secondary muscle invasive cancer while basal tumors are related to advanced disease since they are often metastatic at diagnosis. Moreover, from therapeutic point of view, different researchers showed that mutations of DNA are related to the sensitivity of bladder cancer while performing cisplatin chemotherapy. In this prospective, the bladder cancer molecular subtyping classification might allow identifying the set of patients who can safely avoid neoadjuvant chemotherapy likely because of the low response to systemic chemotherapy (chemoresistant tumors). In this context, the Cancer Genome Atlas (TCGA) project has improved the knowledge of the molecular targets of invasive urothelial cancers allowing the researchers to propose hypothesis suggesting that agents targeting the genomic alterations may be an effective strategy in managing these cancers, which occur in about 68% of muscle invasive cancers. A future goal will be to combine treatment strategies of invasive bladder cancers according to their genetic mutational load defined by molecular pathology.

Download Full-text

CloneSig can jointly infer intra-tumor heterogeneity and mutational signature activity in bulk tumor sequencing data

Nature Communications ◽

10.1038/s41467-021-24992-y ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Judith Abécassis ◽

Fabien Reyal ◽

Jean-Philippe Vert

Keyword(s):

Tumor Heterogeneity ◽

Cancer Genomics ◽

Computational Method ◽

The Cancer Genome Atlas ◽

Sequencing Data ◽

Cancer Dataset ◽

Whole Exome Sequencing Data ◽

Cancer Genome Atlas ◽

Pan Cancer ◽

Mutational Processes

AbstractSystematic DNA sequencing of cancer samples has highlighted the importance of two aspects of cancer genomics: intra-tumor heterogeneity (ITH) and mutational processes. These two aspects may not always be independent, as different mutational processes could be involved in different stages or regions of the tumor, but existing computational approaches to study them largely ignore this potential dependency. Here, we present CloneSig, a computational method to jointly infer ITH and mutational processes in a tumor from bulk-sequencing data. Extensive simulations show that CloneSig outperforms current methods for ITH inference and detection of mutational processes when the distribution of mutational signatures changes between clones. Applied to a large cohort of 8,951 tumors with whole-exome sequencing data from The Cancer Genome Atlas, and on a pan-cancer dataset of 2,632 whole-genome sequencing tumor samples from the Pan-Cancer Analysis of Whole Genomes initiative, CloneSig obtains results overall coherent with previous studies.

Download Full-text

Integration and analysis of CPTAC proteomics data in the context of cancer genomics in the cBioPortal

10.1101/247718 ◽

2018 ◽

Author(s):

Pamela Wu ◽

Zachary J Heins ◽

James T Muller ◽

Adam A Abeshouse ◽

Yichao Sun ◽

...

Keyword(s):

Mass Spectrometry ◽

Clinical Data ◽

Cancer Genomics ◽

Ovarian Tumors ◽

The Cancer Genome Atlas ◽

Mass Spectrometry Data ◽

Proteomics Data ◽

Data Types ◽

Level Data ◽

Graphical Summary

SummaryThe Clinical Proteomic Tumor Analysis Consortium (CPTAC) has produced extensive mass spectrometry based proteomics data for selected breast, colon and ovarian tumors from The Cancer Genome Atlas (TCGA). We have incorporated the CPTAC proteomics data into the cBioPotal to support easy exploration and integrative analysis of these proteomic datasets in the context of the clinical and genomics data from the same tumors. cBioPortal is an open source platform for exploring, visualizing, and analyzing multi-dimensional cancer genomics and clinical data. The public instance of the cBioPortal (http://cbioportal.org/) hosts more than 100 cancer genomics studies including all of the data from TCGA. Its biologist-friendly interface provides many rich analysis features, including a graphical summary of gene-level data across multiple platforms, correlation analysis between genes or other data types, survival analysis, and network visualization. Here, we present the integration of the CPTAC mass spectrometry based proteomics data into the cBioPortal, consisting of 77 breast, 95 colorectal, and 174 ovarian tumors that already have been profiled by TCGA for mutations, copy number alterations, gene expression, and DNA methylation. As a result, the CPTAC data can now be easily explored and analyzed in the cBioPortal in the context of clinical and genomics data. By integrating CPTAC data into cBioPortal, limitations of TCGA proteomics array data can be overcome while also providing a user-friendly web interface, a web API and an R client to query the mass spectrometry data together with genomic, epigenomic, and clinical data.

Download Full-text

Pan-Cancer Exploration of mRNA Mediated Dysregulated Pathways in the Cancer Genomics Cloud

10.1101/599225 ◽

2019 ◽

Author(s):

Margaret Linan ◽

Junwen Wang ◽

Valentin Dinu

Keyword(s):

Cancer Genomics ◽

Mapk Signaling ◽

The Cancer Genome Atlas ◽

Protein Coding ◽

Pagerank Algorithm ◽

Metabolism Pathway ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Kidney Liver ◽

Pan Cancer

AbstractWe performed a comprehensive pan-cancer analysis in the Cancer Genomics Cloud of HTSeq-FPKM normalized protein coding mRNA data from 17 cancer projects in the Cancer Genome Atlas, these are Adrenal Gland, Bile Duct, Bladder, Brain, Breast, Cervix, Colorectal, Esophagus, Head and Neck, Kidney, Liver, Lung, Pancreas, Prostate, Stomach, Thyroid and Uterus. The PoTRA algorithm was applied to the normalized mRNA protein coding data and detected dysregulated pathways that can be implicated in the pathogenesis of these cancers. Then the PageRank algorithm was applied to the PoTRA results to find the most influential dysregulated pathways among all 17 cancer types. Pathways in cancer is the most common dysregulated pathway, and the MAPK signaling pathway is the most influential (PageRank score = 0.2034) while the purine metabolism pathway is the most significantly dysregulated metabolic pathway.

Download Full-text

Genomic, Transcriptomic, Epigenetic, and Immune Profiling of Mucinous Breast Cancer

JNCI Journal of the National Cancer Institute ◽

10.1093/jnci/djz023 ◽

2019 ◽

Vol 111 (7) ◽

pp. 742-746 ◽

Cited By ~ 7

Author(s):

Bastien Nguyen ◽

Isabelle Veys ◽

Sophia Leduc ◽

Yacine Bareche ◽

Samira Majjaj ◽

...

Keyword(s):

Breast Cancer ◽

Cancer Genomics ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Molecular Architecture ◽

Microscopic Evaluation ◽

Er Positive ◽

Cancer Genome Atlas ◽

Mucinous Breast Cancer ◽

Genome Atlas

Abstract Although invasive ductal breast cancer (IDC) represents the most common histological type of breast cancer, minor subtypes exist such as mucinous breast cancer (MuBC). MuBC are distinguished by tumor cells floating in extracellular mucin. MuBC patients are generally older and associated with a favorable prognosis. To unravel the molecular architecture of MuBC, we applied low-pass whole-genome sequencing and microscopic evaluation of stromal tumor infiltrating lymphocytes to 30 MuBC from a retrospective institutional cohort. We further analyzed two independent datasets from the International Cancer Genomics Consortium and The Cancer Genome Atlas. Genomic data (n = 26 MuBC, n = 535 estrogen receptor [ER] positive/HER2-negative IDC), methylation data (n = 28 MuBC, n = 529 ER-positive/HER2-negative IDC), and transcriptomic data (n = 27 MuBC, n = 467 ER-positive/HER2-negative IDC) were analyzed. MuBC was characterized by low tumor infiltrating lymphocyte levels (median = 0.0%, average = 3.4%, 95% confidence interval = 1.9% to 4.9%). Compared with IDC, MuBC had a lower genomic instability (P = .01, two-sided Mann-Whitney U test) and a decreased prevalence of PIK3CA mutations (39.7% in IDC vs 6.7% in MuBC, P = .01 in the International Cancer Genomics Consortium; and 34.8% vs 0.0%, P = .02 in The Cancer Genome Atlas, two-sided Fisher’s exact test). Finally, our report identifies aberrant DNA methylation of MUC2 as a possible cause of extracellular production of mucin in MuBC.

Download Full-text