scholarly journals An integrative somatic mutation analysis to identify pathways linked with survival outcomes across 19 cancer types

2015 ◽  
Author(s):  
Sunho Park ◽  
Seung-Jun Kim ◽  
Donghyeon Yu ◽  
Samuel Pena-Llopis ◽  
Jianjiong Gao ◽  
...  

Identification of altered pathways that are clinically relevant across human cancers is a key challenge in cancer genomics. We developed a network-based algorithm to integrate somatic mutation data with gene networks and pathways, in order to identify pathways altered by somatic mutations across cancers. We applied our approach to The Cancer Genome Atlas (TCGA) dataset of somatic mutations in 4,790 cancer patients with 19 different types of malignancies. Our analysis identified cancer-type-specific altered pathways enriched with known cancer-relevant genes and drug targets. Consensus clustering using gene expression datasets that included 4,870 patients from TCGA and multiple independent cohorts confirmed that the altered pathways could be used to stratify patients into subgroups with significantly different clinical outcomes. Of particular significance, certain patient subpopulations with poor prognosis were identified because they had specific altered pathways for which there are available targeted therapies. These findings could be used to tailor and intensify therapy in these patients, for whom current therapy is suboptimal.

2020 ◽  
Author(s):  
Bowen Gao ◽  
Yunan Luo ◽  
Jianzhu Ma ◽  
Sheng Wang

ABSTRACTTumor stratification, which aims at clustering tumors into biologically meaningful subtypes, is the key step towards personalized treatment. Large-scale profiled cancer genomics data enables us to develop computational methods for tumor stratification. However, most of the existing approaches only considered tumors from an individual cancer type during clustering, leading to the overlook of common patterns across cancer types and the vulnerability to the noise within that cancer type. To address these challenges, we proposed cancerAlign to map tumors of the target cancer type into latent spaces of other source cancer types. These tumors were then clustered in each latent space rather than the original space in order to exploit shared patterns across cancer types. Due to the lack of aligned tumor samples across cancer types, cancerAlign used adversarial learning to learn the mapping at the population level. It then used consensus clustering to integrate cluster labels from different source cancer types. We evaluated cancerAlign on 7,134 tumors spanning 24 cancer types from TCGA and observed substantial improvement on tumor stratification and cancer gene prioritization. We further revealed the transferability across cancer types, which reflected the similarity among them based on the somatic mutation profile. cancerAlign is an unsupervised approach that provides deeper insights into the heterogeneous and rapidly accumulating somatic mutation profile and can be also applied to other genome-scale molecular information.Availabilityhttps://github.com/bowen-gao/cancerAlign


Cancers ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 2487
Author(s):  
Chao Gao ◽  
Guangxu Jin ◽  
Elizabeth Forbes ◽  
Lingegowda S. Mangala ◽  
Yingmei Wang ◽  
...  

IK is a mitotic factor that promotes cell cycle progression. Our previous investigation of 271 endometrial cancer (EC) samples from the Cancer Genome Atlas (TCGA) dataset showed IK somatic mutations were enriched in a cluster of patients with high-grade and high-stage cancers, and this group had longer survival. This study provides insight into how IK somatic mutations contribute to EC pathophysiology. We analyzed the somatic mutational landscape of IK gene in 547 EC patients using expanded TCGA dataset. Co-immunoprecipitation and mass spectrometry were used to identify protein interactions. In vitro and in vivo experiments were used to evaluate IK’s role in EC. The patients with IK-inactivating mutations had longer survival during 10-year follow-up. Frameshift and stop-gain were common mutations and were associated with decreased IK expression. IK knockdown led to enrichment of G2/M phase cells, inactivation of DNA repair signaling mediated by heterodimerization of Ku80 and Ku70, and sensitization of EC cells to cisplatin treatment. IK/Ku80 mutations were accompanied by higher mutation rates and associated with significantly better overall survival. Inactivating mutations of IK gene and loss of IK protein expression were associated with weakened Ku80/Ku70-mediated DNA repair, increased mutation burden, and better response to chemotherapy in patients with EC.


2021 ◽  
Vol 12 ◽  
Author(s):  
Kaisong Bai ◽  
Tong Zhao ◽  
Yilong Li ◽  
Xinjian Li ◽  
Zhantian Zhang ◽  
...  

Pancreatic adenocarcinoma (PAAD) is one of the deadliest malignancies and mortality for PAAD have remained increasing under the conditions of substantial improvements in mortality for other major cancers. Although multiple of studies exists on PAAD, few studies have dissected the oncogenic mechanisms of PAAD based on genomic variation. In this study, we integrated somatic mutation data and gene expression profiles obtained by high-throughput sequencing to characterize the pathogenesis of PAAD. The mutation profile containing 182 samples with 25,470 somatic mutations was obtained from The Cancer Genome Atlas (TCGA). The mutation landscape was generated and somatic mutations in PAAD were found to have preference for mutation location. The combination of mutation matrix and gene expression profiles identified 31 driver genes that were closely associated with tumor cell invasion and apoptosis. Co-expression networks were constructed based on 461 genes significantly associated with driver genes and the hub gene FAM133A in the network was identified to be associated with tumor metastasis. Further, the cascade relationship of somatic mutation-Long non-coding RNA (lncRNA)-microRNA (miRNA) was constructed to reveal a new mechanism for the involvement of mutations in post-transcriptional regulation. We have also identified prognostic markers that are significantly associated with overall survival (OS) of PAAD patients and constructed a risk score model to identify patients’ survival risk. In summary, our study revealed the pathogenic mechanisms and prognostic markers of PAAD providing theoretical support for the development of precision medicine.


2020 ◽  
Author(s):  
Xun Gu

AbstractCurrent cancer genomics databases have accumulated millions of somatic mutations that remain to be further explored, faciltating enormous high throuput analyses to explore the underlying mechanisms that may contribute to malignant initiation or progression. In the context of over-dominant passenger mutations (unrelated to cancers), the challenge is to identify somatic mutations that are cancer-driving. Under the notion that carcinogenesis is a form of somatic-cell evolution, we developed a two-component mixture model that enables to accomplish the following analyses. (i) We formulated a quasi-likelihood approach to test whether the two-component model is significantly better than a single-component model, which can be used for new cancer gene predicting. (ii) We implemented an empirical Bayesian method to calculate the posterior probabilities of a site to be cancer-driving for all sites of a gene, which can be used for new driving site predicting. (iii) We developed a computational procedure to calculate the somatic selection intensity at driver sites and passenger sites, respectively, as well as site-specific profiles for all sites. Using these newly-developed methods, we comprehensively analyzed 294 known cancer genes based on The Cancer Genome Atlas (TCGA) database.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Yaofei Jiang ◽  
Lulu Le

Long noncoding RNAs (lncRNAs) have been confirmed to play a crucial role in human disease, especially in tumor development and progression. Small nucleolar RNA host gene (SNHG3), a newly identified lncRNA, has been found dysregulated in various cancers. Nevertheless, the results remain controversial. Thus, we aim to analyze the comprehensive data to elaborate the association between SNHG3 expression and clinical outcomes in multiple cancers. We searched PubMed, Web of Science, Cochrane Library, Embase, and MEDLINE database to identify eligible articles. STATA software was applied to calculate the hazard ratio (HR) and odds ratio (OR) with 95% confidence interval (95% CI) for survival outcomes and clinical parameters, respectively. Besides, the data from The Cancer Genome Atlas (TCGA) dataset was extracted to verify the results in our meta-analysis. There were thirteen studies totaling 919 cancer patients involved in this meta-analysis. The results demonstrated that high SNHG3 expression was significantly associated with poor overall survival (OS) (HR=2.53, 95% CI: 1.94-3.31) in cancers, disease-free survival (DFS) (HR=3.89, 95% CI: 1.34-11.3), and recurrence-free survival (RFS) (HR=2.42, 95% CI: 1.14-5.15) in hepatocellular carcinoma. Analysis stratified by analysis method, sample size, follow-up time, and cancer type further verified the prognostic value of SNHG3. Additionally, patients with high SNHG3 expression tended to have more advanced clinical stage, higher histological grade, earlier distant metastasis, and earlier lymph node metastasis. Excavation of TCGA dataset valuated that SNHG3 was upregulated in various cancers and predicted worse OS and DFS. Overexpressed SNHG3 was strongly associated with poor survival and clinical outcomes in human cancers and therefore can serve as a promising biomarker for predicting patients’ prognosis.


2020 ◽  
Author(s):  
Ji Hongchen ◽  
Li Junjie ◽  
Zhang Qiong ◽  
Yang Jingyue ◽  
Tian Fei ◽  
...  

Abstract Background: Mutation processes leave different signatures in genes. Previous studies have suggested that both the mutated and flanking bases influence somatic mutation characteristics. However, the understanding of how flanking sequences influence somatic mutation characteristics is limited.Materials and methods: We constructed a long short-term memory – self organizing map (LSTM-SOM) unsupervised neural network. By extracting mutated sequence features via LSTM and clustering similar features with SOM, somatic mutations in The Cancer Genome Atlas database were clustered according to their mutation type and flanking sequences. The relationship between MB and cancer characteristics was then analyzed. At last, we clustered the patients into different classes according to the composition of MB by K-means method, and then studied the differences in clinical features and survival between classes.Results: Ten classes of mutant sequences (named mutation blots, MBs) were obtained from 2,141,527 somatic mutations. Different features in mutation bases and flanking sequences were revealed among MBs. MB reflect both the site and pathological features of cancers. MBs were related to clinical features, including age, sex, and cancer stage. Class of MB in a given gene is associated with survival. Finally, patients were clustered into 7 classes according to MB composition. Significant differences in survival and clinical features were observed among different patient classes.Conclusions: Our study provides a novel method for analyzing the information of mutant sequences and reveals the extensive relationships among mutant sequences, clinical features, and cancer patient survival.


2020 ◽  
Author(s):  
Haoyang Zhang ◽  
Junkang Wei ◽  
Zifeng Liu ◽  
Xun Liu ◽  
Yutian Chong ◽  
...  

AbstractMotivationOncogenes are genes whose malfunctions play critical roles in cancer development, and their discovery is a major aim of cancer mechanisms study. By counting the mutation frequency, oncogenes have been identified with frequent mutations, while it is believed that many more oncogenes could be discovered by differential mutational profile analysis. However, it is common that current methods only utilize mutations in the cancer population, which have an obvious bias in background mutation modelling.MethodsTo predict oncogenes efficiently, we developed a method, DGAT-onco that analyzed the frequency distribution and functional impacts of mutations in both cancer and natural population. Our method can capture the mutational difference of two population, and provide a comprehensive view of genomics basis underlying cancer development. DGAT-onco was constructed by germline mutations from the 1000 Genomes project and somatic mutations of 33 cancer types from the Cancer Genome Atlas (TCGA) dataset. Its reliability was verified on an independent test set including 19 cancers from other sources.ResultsWe demonstrated that our method is more effective than alternative methods in oncogenes discovering. Using this approach achieves higher classification performance in oncogene discovery than 6 alternative methods, and 22.8% significant genes identified by our method were verified as oncogenes by the Cancer Gene Census (CGC).AvailabilityDGAT-onco is available at https://github.com/zhanghaoyang0/[email protected] or [email protected]


2018 ◽  
Author(s):  
Andrew M. Hudson ◽  
Natalie L. Stephenson ◽  
Cynthia Li ◽  
Eleanor Trotter ◽  
Adam J. Fletcher ◽  
...  

AbstractA major challenge in cancer genomics is identifying driver mutations from the large number of neutral passenger mutations within a given tumor. Here, we utilize motifs critical for kinase activity to functionally filter genomic data to identify driver mutations that would otherwise be lost within mutational noise. In the first step of our screen, we define a putative tumor suppressing kinome by identifying kinases with truncation mutations occurring within or before the kinase domain. We aligned these kinase sequences and, utilizing data from the Cancer Cell Line Encyclopedia and The Cancer Genome Atlas databases, identified amino acids that represent predicted hotspots for loss-of-function mutations. The functional consequences of new LOF mutations were validated and the top 15 hotspot LOF residues were used in a pan-cancer analysis to define the tumor-suppressing kinome. A ranked list revealed MAP2K7 as a candidate tumor suppressor in gastric cancer, despite the mutational frequency of MAP2K7 falling within the mutational noise for this cancer type. The majority of mutations in MAP2K7 abolished catalytic activity compared to the wild type kinase, consistent with a tumor suppressive role for MAP2K7 in gastric cancer. Furthermore, reactivation of the JNK pathway in gastric cancer cells harboring LOF mutations in MAP2K7 or JNK1 suppresses clonogenicity and growth in soft agar, demonstrating the functional importance of inactivating the JNK pathway in gastric cancer. In summary, our data highlights a broadly applicable strategy to identify functional cancer driver mutations leading us to define the JNK pathway as tumor suppressive in gastric cancer.SummaryA unique computational pan-cancer analysis pinpoints novel tumor suppressing kinases, and highlights the power of functional genomics by defining the JNK pathway as tumor suppressive in gastric cancer.


2017 ◽  
Vol 1 (2) ◽  
pp. 32-44
Author(s):  
Jiao Li ◽  
Si Zheng ◽  
Hongyu Kang ◽  
Zhen Hou ◽  
Qing Qian

AbstractPurposeIn the open science era, it is typical to share project-generated scientific data by depositing it in an open and accessible database. Moreover, scientific publications are preserved in a digital library archive. It is challenging to identify the data usage that is mentioned in literature and associate it with its source. Here, we investigated the data usage of a government-funded cancer genomics project, The Cancer Genome Atlas (TCGA), via a full-text literature analysis.Design/methodology/approachWe focused on identifying articles using the TCGA dataset and constructing linkages between the articles and the specific TCGA dataset. First, we collected 5,372 TCGA-related articles from PubMed Central (PMC). Second, we constructed a benchmark set with 25 full-text articles that truly used the TCGA data in their studies, and we summarized the key features of the benchmark set. Third, the key features were applied to the remaining PMC full-text articles that were collected from PMC.FindingsThe amount of publications that use TCGA data has increased significantly since 2011, although the TCGA project was launched in 2005. Additionally, we found that the critical areas of focus in the studies that use the TCGA data were glioblastoma multiforme, lung cancer, and breast cancer; meanwhile, data from the RNA-sequencing (RNA-seq) platform is the most preferable for use.Research limitationsThe current workflow to identify articles that truly used TCGA data is labor-intensive. An automatic method is expected to improve the performance.Practical implicationsThis study will help cancer genomics researchers determine the latest advancements in cancer molecular therapy, and it will promote data sharing and data-intensive scientific discovery.Originality/valueFew studies have been conducted to investigate data usage by government-funded projects/programs since their launch. In this preliminary study, we extracted articles that use TCGA data from PMC, and we created a link between the full-text articles and the source data.


Author(s):  
Wenyi Zhao ◽  
Jingwen Yang ◽  
Jingcheng Wu ◽  
Guoxing Cai ◽  
Yao Zhang ◽  
...  

Abstract Current cancer genomics databases have accumulated millions of somatic mutations that remain to be further explored. Due to the over-excess mutations unrelated to cancer, the great challenge is to identify somatic mutations that are cancer-driven. Under the notion that carcinogenesis is a form of somatic-cell evolution, we developed a two-component mixture model: while the ground component corresponds to passenger mutations, the rapidly evolving component corresponds to driver mutations. Then, we implemented an empirical Bayesian procedure to calculate the posterior probability of a site being cancer-driven. Based on these, we developed a software CanDriS (Cancer Driver Sites) to profile the potential cancer-driving sites for thousands of tumor samples from the Cancer Genome Atlas and International Cancer Genome Consortium across tumor types and pan-cancer level. As a result, we identified that approximately 1% of the sites have posterior probabilities larger than 0.90 and listed potential cancer-wide and cancer-specific driver mutations. By comprehensively profiling all potential cancer-driving sites, CanDriS greatly enhances our ability to refine our knowledge of the genetic basis of cancer and might guide clinical medication in the upcoming era of precision medicine. The results were displayed in a database CandrisDB (http://biopharm.zju.edu.cn/candrisdb/).


Sign in / Sign up

Export Citation Format

Share Document