An integrative somatic mutation analysis to identify pathways linked with survival outcomes across 19 cancer types

Mapping Intimacies ◽

10.1101/017582 ◽

2015 ◽

Author(s):

Sunho Park ◽

Seung-Jun Kim ◽

Donghyeon Yu ◽

Samuel Pena-Llopis ◽

Jianjiong Gao ◽

...

Keyword(s):

Somatic Mutation ◽

Gene Networks ◽

Drug Targets ◽

Somatic Mutations ◽

Cancer Genomics ◽

The Cancer Genome Atlas ◽

Current Therapy ◽

Cancer Type ◽

Consensus Clustering ◽

Tcga Dataset

Identification of altered pathways that are clinically relevant across human cancers is a key challenge in cancer genomics. We developed a network-based algorithm to integrate somatic mutation data with gene networks and pathways, in order to identify pathways altered by somatic mutations across cancers. We applied our approach to The Cancer Genome Atlas (TCGA) dataset of somatic mutations in 4,790 cancer patients with 19 different types of malignancies. Our analysis identified cancer-type-specific altered pathways enriched with known cancer-relevant genes and drug targets. Consensus clustering using gene expression datasets that included 4,870 patients from TCGA and multiple independent cohorts confirmed that the altered pathways could be used to stratify patients into subgroups with significantly different clinical outcomes. Of particular significance, certain patient subpopulations with poor prognosis were identified because they had specific altered pathways for which there are available targeted therapies. These findings could be used to tailor and intensify therapy in these patients, for whom current therapy is suboptimal.

Download Full-text

cancerAlign: Stratifying tumors by unsupervised alignment across cancer types

10.1101/2020.11.17.387860 ◽

2020 ◽

Author(s):

Bowen Gao ◽

Yunan Luo ◽

Jianzhu Ma ◽

Sheng Wang

Keyword(s):

Somatic Mutation ◽

Large Scale ◽

Cancer Genomics ◽

Population Level ◽

Substantial Improvement ◽

Cancer Type ◽

Consensus Clustering ◽

Learning To Learn ◽

Mutation Profile ◽

Cancer Types

ABSTRACTTumor stratification, which aims at clustering tumors into biologically meaningful subtypes, is the key step towards personalized treatment. Large-scale profiled cancer genomics data enables us to develop computational methods for tumor stratification. However, most of the existing approaches only considered tumors from an individual cancer type during clustering, leading to the overlook of common patterns across cancer types and the vulnerability to the noise within that cancer type. To address these challenges, we proposed cancerAlign to map tumors of the target cancer type into latent spaces of other source cancer types. These tumors were then clustered in each latent space rather than the original space in order to exploit shared patterns across cancer types. Due to the lack of aligned tumor samples across cancer types, cancerAlign used adversarial learning to learn the mapping at the population level. It then used consensus clustering to integrate cluster labels from different source cancer types. We evaluated cancerAlign on 7,134 tumors spanning 24 cancer types from TCGA and observed substantial improvement on tumor stratification and cancer gene prioritization. We further revealed the transferability across cancer types, which reflected the similarity among them based on the somatic mutation profile. cancerAlign is an unsupervised approach that provides deeper insights into the heterogeneous and rapidly accumulating somatic mutation profile and can be also applied to other genome-scale molecular information.Availabilityhttps://github.com/bowen-gao/cancerAlign

Download Full-text

Inactivating Mutations of the IK Gene Weaken Ku80/Ku70-Mediated DNA Repair and Sensitize Endometrial Cancer to Chemotherapy

Cancers ◽

10.3390/cancers13102487 ◽

2021 ◽

Vol 13 (10) ◽

pp. 2487

Author(s):

Chao Gao ◽

Guangxu Jin ◽

Elizabeth Forbes ◽

Lingegowda S. Mangala ◽

Yingmei Wang ◽

...

Keyword(s):

Dna Repair ◽

Endometrial Cancer ◽

Protein Interactions ◽

Somatic Mutations ◽

The Cancer Genome Atlas ◽

Response To Chemotherapy ◽

Tcga Dataset ◽

Inactivating Mutations

IK is a mitotic factor that promotes cell cycle progression. Our previous investigation of 271 endometrial cancer (EC) samples from the Cancer Genome Atlas (TCGA) dataset showed IK somatic mutations were enriched in a cluster of patients with high-grade and high-stage cancers, and this group had longer survival. This study provides insight into how IK somatic mutations contribute to EC pathophysiology. We analyzed the somatic mutational landscape of IK gene in 547 EC patients using expanded TCGA dataset. Co-immunoprecipitation and mass spectrometry were used to identify protein interactions. In vitro and in vivo experiments were used to evaluate IK’s role in EC. The patients with IK-inactivating mutations had longer survival during 10-year follow-up. Frameshift and stop-gain were common mutations and were associated with decreased IK expression. IK knockdown led to enrichment of G2/M phase cells, inactivation of DNA repair signaling mediated by heterodimerization of Ku80 and Ku70, and sensitization of EC cells to cisplatin treatment. IK/Ku80 mutations were accompanied by higher mutation rates and associated with significantly better overall survival. Inactivating mutations of IK gene and loss of IK protein expression were associated with weakened Ku80/Ku70-mediated DNA repair, increased mutation burden, and better response to chemotherapy in patients with EC.

Download Full-text

Integrating Genetic and Transcriptomic Data to Reveal Pathogenesis and Prognostic Markers of Pancreatic Adenocarcinoma

Frontiers in Genetics ◽

10.3389/fgene.2021.747270 ◽

2021 ◽

Vol 12 ◽

Author(s):

Kaisong Bai ◽

Tong Zhao ◽

Yilong Li ◽

Xinjian Li ◽

Zhantian Zhang ◽

...

Keyword(s):

Gene Expression ◽

Pancreatic Adenocarcinoma ◽

Somatic Mutation ◽

Somatic Mutations ◽

Prognostic Markers ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Genomic Variation ◽

The Cancer Genome Atlas ◽

Driver Genes

Pancreatic adenocarcinoma (PAAD) is one of the deadliest malignancies and mortality for PAAD have remained increasing under the conditions of substantial improvements in mortality for other major cancers. Although multiple of studies exists on PAAD, few studies have dissected the oncogenic mechanisms of PAAD based on genomic variation. In this study, we integrated somatic mutation data and gene expression profiles obtained by high-throughput sequencing to characterize the pathogenesis of PAAD. The mutation profile containing 182 samples with 25,470 somatic mutations was obtained from The Cancer Genome Atlas (TCGA). The mutation landscape was generated and somatic mutations in PAAD were found to have preference for mutation location. The combination of mutation matrix and gene expression profiles identified 31 driver genes that were closely associated with tumor cell invasion and apoptosis. Co-expression networks were constructed based on 461 genes significantly associated with driver genes and the hub gene FAM133A in the network was identified to be associated with tumor metastasis. Further, the cascade relationship of somatic mutation-Long non-coding RNA (lncRNA)-microRNA (miRNA) was constructed to reveal a new mechanism for the involvement of mutations in post-transcriptional regulation. We have also identified prognostic markers that are significantly associated with overall survival (OS) of PAAD patients and constructed a risk score model to identify patients’ survival risk. In summary, our study revealed the pathogenic mechanisms and prognostic markers of PAAD providing theoretical support for the development of precision medicine.

Download Full-text

A Statistical Framework for Evolutionary Analysis of Recurrent Somatic Mutations in Cancers

10.1101/2020.04.10.036095 ◽

2020 ◽

Author(s):

Xun Gu

Keyword(s):

Somatic Mutations ◽

Cancer Genomics ◽

Computational Procedure ◽

The Cancer Genome Atlas ◽

Component Model ◽

Evolutionary Analysis ◽

Cancer Genes ◽

Component Mixture ◽

Two Component ◽

Empirical Bayesian Method

AbstractCurrent cancer genomics databases have accumulated millions of somatic mutations that remain to be further explored, faciltating enormous high throuput analyses to explore the underlying mechanisms that may contribute to malignant initiation or progression. In the context of over-dominant passenger mutations (unrelated to cancers), the challenge is to identify somatic mutations that are cancer-driving. Under the notion that carcinogenesis is a form of somatic-cell evolution, we developed a two-component mixture model that enables to accomplish the following analyses. (i) We formulated a quasi-likelihood approach to test whether the two-component model is significantly better than a single-component model, which can be used for new cancer gene predicting. (ii) We implemented an empirical Bayesian method to calculate the posterior probabilities of a site to be cancer-driving for all sites of a gene, which can be used for new driving site predicting. (iii) We developed a computational procedure to calculate the somatic selection intensity at driver sites and passenger sites, respectively, as well as site-specific profiles for all sites. Using these newly-developed methods, we comprehensively analyzed 294 known cancer genes based on The Cancer Genome Atlas (TCGA) database.

Download Full-text

Overexpression of lncRNA SNGH3 Predicts Unfavorable Prognosis and Clinical Outcomes in Human Cancers: Evidence from a Meta-Analysis

BioMed Research International ◽

10.1155/2020/7974034 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Yaofei Jiang ◽

Lulu Le

Keyword(s):

Clinical Outcomes ◽

Histological Grade ◽

Meta Analysis ◽

Disease Free Survival ◽

The Cancer Genome Atlas ◽

Cochrane Library ◽

Cancer Type ◽

Free Survival ◽

Human Cancers ◽

Tcga Dataset

Long noncoding RNAs (lncRNAs) have been confirmed to play a crucial role in human disease, especially in tumor development and progression. Small nucleolar RNA host gene (SNHG3), a newly identified lncRNA, has been found dysregulated in various cancers. Nevertheless, the results remain controversial. Thus, we aim to analyze the comprehensive data to elaborate the association between SNHG3 expression and clinical outcomes in multiple cancers. We searched PubMed, Web of Science, Cochrane Library, Embase, and MEDLINE database to identify eligible articles. STATA software was applied to calculate the hazard ratio (HR) and odds ratio (OR) with 95% confidence interval (95% CI) for survival outcomes and clinical parameters, respectively. Besides, the data from The Cancer Genome Atlas (TCGA) dataset was extracted to verify the results in our meta-analysis. There were thirteen studies totaling 919 cancer patients involved in this meta-analysis. The results demonstrated that high SNHG3 expression was significantly associated with poor overall survival (OS) (HR=2.53, 95% CI: 1.94-3.31) in cancers, disease-free survival (DFS) (HR=3.89, 95% CI: 1.34-11.3), and recurrence-free survival (RFS) (HR=2.42, 95% CI: 1.14-5.15) in hepatocellular carcinoma. Analysis stratified by analysis method, sample size, follow-up time, and cancer type further verified the prognostic value of SNHG3. Additionally, patients with high SNHG3 expression tended to have more advanced clinical stage, higher histological grade, earlier distant metastasis, and earlier lymph node metastasis. Excavation of TCGA dataset valuated that SNHG3 was upregulated in various cancers and predicted worse OS and DFS. Overexpressed SNHG3 was strongly associated with poor survival and clinical outcomes in human cancers and therefore can serve as a promising biomarker for predicting patients’ prognosis.

Download Full-text

Insight of characteristic of mutation sequences in human cancers via an unsupervised neural network approach.

10.21203/rs.3.rs-115811/v2 ◽

2020 ◽

Author(s):

Ji Hongchen ◽

Li Junjie ◽

Zhang Qiong ◽

Yang Jingyue ◽

Tian Fei ◽

...

Keyword(s):

Neural Network ◽

Clinical Features ◽

Somatic Mutation ◽

Somatic Mutations ◽

Short Term Memory ◽

The Cancer Genome Atlas ◽

Self Organizing Map ◽

Neural Network Approach ◽

Unsupervised Neural Network ◽

Flanking Sequences

Abstract Background: Mutation processes leave different signatures in genes. Previous studies have suggested that both the mutated and flanking bases influence somatic mutation characteristics. However, the understanding of how flanking sequences influence somatic mutation characteristics is limited.Materials and methods: We constructed a long short-term memory – self organizing map (LSTM-SOM) unsupervised neural network. By extracting mutated sequence features via LSTM and clustering similar features with SOM, somatic mutations in The Cancer Genome Atlas database were clustered according to their mutation type and flanking sequences. The relationship between MB and cancer characteristics was then analyzed. At last, we clustered the patients into different classes according to the composition of MB by K-means method, and then studied the differences in clinical features and survival between classes.Results: Ten classes of mutant sequences (named mutation blots, MBs) were obtained from 2,141,527 somatic mutations. Different features in mutation bases and flanking sequences were revealed among MBs. MB reflect both the site and pathological features of cancers. MBs were related to clinical features, including age, sex, and cancer stage. Class of MB in a given gene is associated with survival. Finally, patients were clustered into 7 classes according to MB composition. Significant differences in survival and clinical features were observed among different patient classes.Conclusions: Our study provides a novel method for analyzing the information of mutant sequences and reveals the extensive relationships among mutant sequences, clinical features, and cancer patient survival.

Download Full-text

DGAT-onco: A powerful method to detect oncogenes by integrating differential mutational analysis and functional impacts of somatic mutations

10.1101/2020.02.15.947085 ◽

2020 ◽

Author(s):

Haoyang Zhang ◽

Junkang Wei ◽

Zifeng Liu ◽

Xun Liu ◽

Yutian Chong ◽

...

Keyword(s):

Somatic Mutations ◽

Mutational Analysis ◽

Profile Analysis ◽

Classification Performance ◽

The Cancer Genome Atlas ◽

Alternative Methods ◽

Cancer Development ◽

Frequent Mutations ◽

Cancer Genome Atlas ◽

Tcga Dataset

AbstractMotivationOncogenes are genes whose malfunctions play critical roles in cancer development, and their discovery is a major aim of cancer mechanisms study. By counting the mutation frequency, oncogenes have been identified with frequent mutations, while it is believed that many more oncogenes could be discovered by differential mutational profile analysis. However, it is common that current methods only utilize mutations in the cancer population, which have an obvious bias in background mutation modelling.MethodsTo predict oncogenes efficiently, we developed a method, DGAT-onco that analyzed the frequency distribution and functional impacts of mutations in both cancer and natural population. Our method can capture the mutational difference of two population, and provide a comprehensive view of genomics basis underlying cancer development. DGAT-onco was constructed by germline mutations from the 1000 Genomes project and somatic mutations of 33 cancer types from the Cancer Genome Atlas (TCGA) dataset. Its reliability was verified on an independent test set including 19 cancers from other sources.ResultsWe demonstrated that our method is more effective than alternative methods in oncogenes discovering. Using this approach achieves higher classification performance in oncogene discovery than 6 alternative methods, and 22.8% significant genes identified by our method were verified as oncogenes by the Cancer Gene Census (CGC).AvailabilityDGAT-onco is available at https://github.com/zhanghaoyang0/[email protected] or [email protected]

Download Full-text

Truncation and Motif Based Pan-Cancer Analysis Highlights Novel Tumor Suppressing Kinases

10.1101/254813 ◽

2018 ◽

Author(s):

Andrew M. Hudson ◽

Natalie L. Stephenson ◽

Cynthia Li ◽

Eleanor Trotter ◽

Adam J. Fletcher ◽

...

Keyword(s):

Gastric Cancer ◽

Cancer Genomics ◽

Cancer Cell Line ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Cancer Type ◽

Loss Of Function ◽

Gastric Cancer Cells ◽

Jnk Pathway ◽

Pan Cancer

AbstractA major challenge in cancer genomics is identifying driver mutations from the large number of neutral passenger mutations within a given tumor. Here, we utilize motifs critical for kinase activity to functionally filter genomic data to identify driver mutations that would otherwise be lost within mutational noise. In the first step of our screen, we define a putative tumor suppressing kinome by identifying kinases with truncation mutations occurring within or before the kinase domain. We aligned these kinase sequences and, utilizing data from the Cancer Cell Line Encyclopedia and The Cancer Genome Atlas databases, identified amino acids that represent predicted hotspots for loss-of-function mutations. The functional consequences of new LOF mutations were validated and the top 15 hotspot LOF residues were used in a pan-cancer analysis to define the tumor-suppressing kinome. A ranked list revealed MAP2K7 as a candidate tumor suppressor in gastric cancer, despite the mutational frequency of MAP2K7 falling within the mutational noise for this cancer type. The majority of mutations in MAP2K7 abolished catalytic activity compared to the wild type kinase, consistent with a tumor suppressive role for MAP2K7 in gastric cancer. Furthermore, reactivation of the JNK pathway in gastric cancer cells harboring LOF mutations in MAP2K7 or JNK1 suppresses clonogenicity and growth in soft agar, demonstrating the functional importance of inactivating the JNK pathway in gastric cancer. In summary, our data highlights a broadly applicable strategy to identify functional cancer driver mutations leading us to define the JNK pathway as tumor suppressive in gastric cancer.SummaryA unique computational pan-cancer analysis pinpoints novel tumor suppressing kinases, and highlights the power of functional genomics by defining the JNK pathway as tumor suppressive in gastric cancer.

Download Full-text

Identifying Scientific Project-generated Data Citation from Full-text Articles: An Investigation of TCGA Data Citation

Journal of Data and Information Science ◽

10.20309/jdis.201612 ◽

2017 ◽

Vol 1 (2) ◽

pp. 32-44

Author(s):

Jiao Li ◽

Si Zheng ◽

Hongyu Kang ◽

Zhen Hou ◽

Qing Qian

Keyword(s):

Full Text ◽

Cancer Genomics ◽

Scientific Discovery ◽

Scientific Data ◽

The Cancer Genome Atlas ◽

Molecular Therapy ◽

Data Usage ◽

Data Citation ◽

Key Features ◽

Tcga Dataset

AbstractPurposeIn the open science era, it is typical to share project-generated scientific data by depositing it in an open and accessible database. Moreover, scientific publications are preserved in a digital library archive. It is challenging to identify the data usage that is mentioned in literature and associate it with its source. Here, we investigated the data usage of a government-funded cancer genomics project, The Cancer Genome Atlas (TCGA), via a full-text literature analysis.Design/methodology/approachWe focused on identifying articles using the TCGA dataset and constructing linkages between the articles and the specific TCGA dataset. First, we collected 5,372 TCGA-related articles from PubMed Central (PMC). Second, we constructed a benchmark set with 25 full-text articles that truly used the TCGA data in their studies, and we summarized the key features of the benchmark set. Third, the key features were applied to the remaining PMC full-text articles that were collected from PMC.FindingsThe amount of publications that use TCGA data has increased significantly since 2011, although the TCGA project was launched in 2005. Additionally, we found that the critical areas of focus in the studies that use the TCGA data were glioblastoma multiforme, lung cancer, and breast cancer; meanwhile, data from the RNA-sequencing (RNA-seq) platform is the most preferable for use.Research limitationsThe current workflow to identify articles that truly used TCGA data is labor-intensive. An automatic method is expected to improve the performance.Practical implicationsThis study will help cancer genomics researchers determine the latest advancements in cancer molecular therapy, and it will promote data sharing and data-intensive scientific discovery.Originality/valueFew studies have been conducted to investigate data usage by government-funded projects/programs since their launch. In this preliminary study, we extracted articles that use TCGA data from PMC, and we created a link between the full-text articles and the source data.

Download Full-text

CanDriS: posterior profiling of cancer-driving sites based on two-component evolutionary model

Briefings in Bioinformatics ◽

10.1093/bib/bbab131 ◽

2021 ◽

Author(s):

Wenyi Zhao ◽

Jingwen Yang ◽

Jingcheng Wu ◽

Guoxing Cai ◽

Yao Zhang ◽

...

Keyword(s):

Somatic Mutations ◽

Cancer Genomics ◽

Evolutionary Model ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Component Mixture ◽

Cancer Driver ◽

Two Component ◽

Potential Cancer

Abstract Current cancer genomics databases have accumulated millions of somatic mutations that remain to be further explored. Due to the over-excess mutations unrelated to cancer, the great challenge is to identify somatic mutations that are cancer-driven. Under the notion that carcinogenesis is a form of somatic-cell evolution, we developed a two-component mixture model: while the ground component corresponds to passenger mutations, the rapidly evolving component corresponds to driver mutations. Then, we implemented an empirical Bayesian procedure to calculate the posterior probability of a site being cancer-driven. Based on these, we developed a software CanDriS (Cancer Driver Sites) to profile the potential cancer-driving sites for thousands of tumor samples from the Cancer Genome Atlas and International Cancer Genome Consortium across tumor types and pan-cancer level. As a result, we identified that approximately 1% of the sites have posterior probabilities larger than 0.90 and listed potential cancer-wide and cancer-specific driver mutations. By comprehensively profiling all potential cancer-driving sites, CanDriS greatly enhances our ability to refine our knowledge of the genetic basis of cancer and might guide clinical medication in the upcoming era of precision medicine. The results were displayed in a database CandrisDB (http://biopharm.zju.edu.cn/candrisdb/).

Download Full-text