Cancer genes discovery based on integtating transcriptomic data and the impact of gene length

The Impact of Variants at Branchpoint Splicing Elements in Cancer Genes

SSRN Electronic Journal ◽

10.2139/ssrn.3933049 ◽

2021 ◽

Author(s):

Daffodil Canson ◽

Troy Dumenil ◽

Michael Parsons ◽

Tracy O’Mara ◽

Aimee Davidson ◽

...

Keyword(s):

Cancer Genes ◽

The Impact

Download Full-text

Meta-analysis of transcriptomic data reveals clusters of consistently deregulated gene and disease ontologies in Down syndrome

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009317 ◽

2021 ◽

Vol 17 (9) ◽

pp. e1009317

Author(s):

Ilario De Toma ◽

Cesar Sierra ◽

Mara Dierssen

Keyword(s):

Down Syndrome ◽

Differential Expression ◽

Web Application ◽

Meta Analysis ◽

Chromosome 21 ◽

Complex Disorders ◽

Transcriptomic Data ◽

Genome Wide ◽

A Genome ◽

The Impact

Trisomy of human chromosome 21 (HSA21) causes Down syndrome (DS). The trisomy does not simply result in the upregulation of HSA21--encoded genes but also leads to a genome-wide transcriptomic deregulation, which affect differently each tissue and cell type as a result of epigenetic mechanisms and protein-protein interactions. We performed a meta-analysis integrating the differential expression (DE) analyses of all publicly available transcriptomic datasets, both in human and mouse, comparing trisomic and euploid transcriptomes from different sources. We integrated all these data in a “DS network”. We found that genome wide deregulation as a consequence of trisomy 21 is not arbitrary, but involves deregulation of specific molecular cascades in which both HSA21 genes and HSA21 interactors are more consistently deregulated compared to other genes. In fact, gene deregulation happens in “clusters”, so that groups from 2 to 13 genes are found consistently deregulated. Most of these events of “co-deregulation” involve genes belonging to the same GO category, and genes associated with the same disease class. The most consistent changes are enriched in interferon related categories and neutrophil activation, reinforcing the concept that DS is an inflammatory disease. Our results also suggest that the impact of the trisomy might diverge in each tissue due to the different gene set deregulation, even though the triplicated genes are the same. Our original method to integrate transcriptomic data confirmed not only the importance of known genes, such as SOD1, but also detected new ones that could be extremely useful for generating or confirming hypotheses and supporting new putative therapeutic candidates. We created “metaDEA” an R package that uses our method to integrate every kind of transcriptomic data and therefore could be used with other complex disorders, such as cancer. We also created a user-friendly web application to query Ensembl gene IDs and retrieve all the information of their differential expression across the datasets.

Download Full-text

Protein Domain Hotspots Reveal Functional Mutations across Genes in Cancer

10.1101/015719 ◽

2015 ◽

Author(s):

Martin L Miller ◽

Ed Reznik ◽

Nicholas P Gauthier ◽

Bülent Arman Aksoy ◽

Anil Korkut ◽

...

Keyword(s):

Statistical Power ◽

Cancer Genomics ◽

Protein Domains ◽

Rapid Expansion ◽

Protein Domain ◽

Cancer Genes ◽

Sequence Alignments ◽

Hotspot Analysis ◽

Functional Interpretation ◽

The Impact

In cancer genomics, frequent recurrence of mutations in independent tumor samples is a strong indication of functional impact. However, rare functional mutations can escape detection by recurrence analysis for lack of statistical power. We address this problem by extending the notion of recurrence of mutations from single genes to gene families that share homologous protein domains. In addition to lowering the threshold of detection, this sharpens the functional interpretation of the impact of mutations, as protein domains more succinctly embody function than entire genes. Mapping mutations in 22 different tumor types to equivalent positions in multiple sequence alignments of protein domains, we confirm well-known functional mutation hotspots and make two types of discoveries: 1) identification and functional interpretation of uncharacterized rare variants in one gene that are equivalent to well-characterized mutations in canonical cancer genes, such as uncharacterizedERBB4(S303F) mutations that are analogous to canonicalERRB2(S310F) mutations in the furin-like domain, and 2) detection of previously unknown mutation hotspots with novel functional implications. With the rapid expansion of cancer genomics projects, protein domain hotspot analysis is likely to provide many more leads linking mutations in proteins to the cancer phenotype.

Download Full-text

Tumour purity as a prognostic factor in colon cancer

10.1101/263723 ◽

2018 ◽

Cited By ~ 1

Author(s):

Yihao Mao ◽

Qingyang Feng ◽

Peng Zheng ◽

Liangliang Yang ◽

Tianyu Liu ◽

...

Keyword(s):

Colon Cancer ◽

Adjuvant Chemotherapy ◽

Prognostic Factor ◽

Relative Proportion ◽

Genetic Profile ◽

M2 Macrophages ◽

Negatively Associated ◽

Transcriptomic Data ◽

Mutation Burden ◽

The Impact

AbstractTumour purity is defined as the proportion of cancer cells in the tumour tissue. The impact of tumour purity on colon cancer (CC) prognosis, genetic profile and microenvironment has not been thoroughly accessed. Therefore, clinical and transcriptomic data from three public datasets, GSE17536/17537, GSE39582, and TCGA were retrospectively collected (n = 1248). Tumour purity of each sample was inferred by a computational method based on transcriptomic data. Stage III and MMR-deficient (dMMR) CC patients showed a significantly lower tumour purity. Low purity CC conferred worse survival and tumour purity was identified as an independent prognostic factor. Moreover, high tumour purity CC patients benefited more from adjuvant chemotherapy. Subsequent genomic analysis found that the mutation burden was negatively associated with tumour purity with only APC and KRAS significantly more mutated in high purity CC. However, no somatic copy number alteration event was correlated with tumour purity. Furthermore, immune-related pathways and immunotherapy-associated markers (PD-1, PD-L1, CTLA-4, LAG-3, and TIM-3) were highly enriched in low purity samples. Notably, the relative proportion of M2 macrophages and neutrophils, which indicated worse survival in CC, was negatively associated with tumour purity. Therefore, tumour purity exhibited potential value for CC prognostic stratification as well as adjuvant chemotherapy benefit prediction. The relative worse survival in low purity CC may attribute to higher mutation frequency in key pathways and purity related microenvironmental changing.SummaryLow purity colon cancer patients conferred worse survival and benefited less from adjuvant chemotherapy. The mutation burden was negatively associated with tumour purity. Low purity samples exhibited intense immune phenotype with more M2 macrophages and neutrophils infiltration.

Download Full-text

Evaluation of BRCA1/2 and homologous recombination defects in ovarian cancer and impact on clinical outcomes.

Journal of Clinical Oncology ◽

10.1200/jco.2017.35.15_suppl.5511 ◽

2017 ◽

Vol 35 (15_suppl) ◽

pp. 5511-5511 ◽

Cited By ~ 4

Author(s):

Melinda S. Yates ◽

Kirsten Timms ◽

Molly S Daniels ◽

Holly D. Oakley ◽

Mark F. Munsell ◽

...

Keyword(s):

Ovarian Cancer ◽

Homologous Recombination ◽

Cancer Patients ◽

Clinical Outcomes ◽

Germline Mutation ◽

Large Scale ◽

Recurrent Ovarian Cancer ◽

State Transitions ◽

Cancer Genes ◽

The Impact

5511 Background: Recent studies show that germline or somatic BRCA1/2 mutations and homologous recombination (HR) defects can be used to predict response to PARP inhibitors in recurrent ovarian cancer. However, the impact of defects in BRCA1/2 and HR genes on overall clinical outcomes are not yet defined for patients undergoing neoadjuvant chemotherapy (NACT) versus upfront surgical debulking (USD). Methods: Previously untreated ovarian cancer patients were prospectively enrolled under approved IRB protocol. Germline and tumor BRCA1/2 mutation testing and methylation were analyzed when sufficient tumor and blood was available. Mutation in 21 additional hereditary cancer genes (including HR genes) was also evaluated. Tumor HR defects were scored on LOH, telomeric allelic imbalance, and large-scale state transitions (as previously described). Presence of germline or somatic BRCA1/2 mutations, BRCA1 methylation, HR score ≥42, or germline mutation in other HR genes were defined together as HRD positive. Results: Of 299 enrolled patients, 129 (43%) received USD and 170 (57%) received NACT. Patients receiving USD had better outcomes compared to NACT, including overall survival (OS, 65.8 vs 45.2 months, p = 0.0003) and event free survival (EFS, 24.8 vs 15.6 months, p < 0.0001). In the overall cohort, EFS was significantly longer for HRD positive patients vs HRD negative (20.5 vs 16.3 months, p = 0.0268). Patients with somatic and germline BRCA1/2 mutations had longer OS vs BRCA1/2 negative (65.3 vs 46.1 months, p = 0.0403). Overall outcomes were worse in NACT compared to USD, but impact of BRCA1/2 mutations and HR defects was stronger in this group. NACT patients with any HR defect had longer EFS (19.7 vs 14.5 months, p = 0.0247). NACT patients with BRCA1/2 germline mutations had longer OS (65.3 vs 38.3 months, p = 0.0230). NACT patients with BRCA1/2 germline mutation had longer EFS (22.6 vs 14.6 months, p = 0.0047). OS and EFS in USD patients were significantly changed based on only debulking status; mutation or HR status did not have a statistically significant effect. Conclusions: While HR defects and BRCA1/2 mutations influence overall outcomes for ovarian cancer patients, the impact is stronger in NACT compared to USD.

Download Full-text

Scrublet: computational identification of cell doublets in single-cell transcriptomic data

10.1101/357368 ◽

2018 ◽

Cited By ~ 23

Author(s):

Samuel L. Wolock ◽

Romain Lopez ◽

Allon M. Klein

Keyword(s):

Single Cell ◽

Nearest Neighbor ◽

Expert Knowledge ◽

Transcriptomic Data ◽

Nearest Neighbor Classifier ◽

Cell Clustering ◽

Powerful Approach ◽

Single Cell Rna Sequencing ◽

The Impact ◽

Neighbor Classifier

AbstractSingle-cell RNA-sequencing has become a widely used, powerful approach for studying cell populations. However, these methods often generate multiplet artifacts, where two or more cells receive the same barcode, resulting in a hybrid transcriptome. In most experiments, multiplets account for several percent of transcriptomes and can confound downstream data analysis. Here, we present Scrublet (Single-Cell Remover of Doublets), a framework for predicting the impact of multiplets in a given analysis and identifying problematic multiplets. Scrublet avoids the need for expert knowledge or cell clustering by simulating multiplets from the data and building a nearest neighbor classifier. To demonstrate the utility of this approach, we test Scrublet on several datasets that include independent knowledge of cell multiplets.

Download Full-text

Evaluation of clonal hematopoiesis in late stage NSCLC using a next-generation sequencing panel targeting cancer genes.

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.9050 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. 9050-9050 ◽

Cited By ~ 1

Author(s):

Stephanie J. Yaung ◽

Frederike Fuhlbrück ◽

Johnny Wu ◽

Fergal Casey ◽

Maureen Peterson ◽

...

Keyword(s):

Lung Cancer ◽

Next Generation Sequencing ◽

Locally Advanced ◽

Sequencing Analysis ◽

Plasma Samples ◽

Cancer Genes ◽

Next Generation ◽

Clonal Hematopoiesis ◽

The Impact ◽

Generation Sequencing

9050 Background: Somatic mutations derived from the expansion of clonal populations of blood cells (clonal hematopoiesis of indeterminate potential, or CHIP) may be detected in sequencing of cell-free DNA (cfDNA) samples. We evaluated the potential implications of CHIP in targeted sequencing of lung cancer plasma samples using matched peripheral blood mononuclear cells (PBMC) to identify CHIP. Methods: Samples were evaluated from OAK, a phase 3 trial of atezolizumab in locally advanced or metastatic NSCLC following failure with platinum-based therapy. 94 samples from Cycle 1 Day 1 (C1D1) plasma and matched PBMC were analyzed with the AVENIO ctDNA Surveillance Kit (For Research Use Only, not for use in diagnostic procedures), a 198-kb next-generation sequencing panel targeting cancer genes. Plasma samples from subsequent cycles of therapy (C2D1, C3D1, and C4D1) were also sequenced with the same panel. Using median input amounts of 22.8 ng cfDNA and 50 ng PBMC DNA, we obtained median deduplicated depths of 5413 and 5070, respectively. Results: In C1D1 cfDNA, a median of 120 single nucleotide variants were detected per sample, with 5.13% of variants not identified in matched PBMC (i.e., putative tumor-derived somatic variants) versus 94.87% of variants identified in matched PBMC (i.e., germline or CHIP variants). While the majority of PBMC-matched variants were SNPs with allele frequency (AF) around 50% or 100% as expected, there was a median of 1 (range 0-8) PBMC-matched cfDNA variants per sample with AF below 10%. Consistent with CHIP, the number of PBMC-matched cfDNA variants per subject below AF 10% were positively associated with age (p-value = 0.0145), and TP53 was the most frequently mutated gene. We found similar results in plasma samples from subsequent cycles. Conclusions: Plasma and PBMC sequencing analysis identified potential mutations derived from CHIP. However, 39% of cfDNA samples had zero potential CHIP mutations identified in the study, possibly due to the specific regions targeted by the AVENIO assay. While this study suggests that only a small percentage of variants detected by the AVENIO Surveillance panel in lung cancer are derived from CHIP, further studies are warranted to assess the impact and removal of these variants.

Download Full-text

BRCA testing concordance with national guidelines for patients with breast cancer in community cancer programs.

Journal of Clinical Oncology ◽

10.1200/jco.2020.38.15_suppl.1526 ◽

2020 ◽

Vol 38 (15_suppl) ◽

pp. 1526-1526

Author(s):

Leigh Boehmer ◽

Latha Shivakumar ◽

Christine B. Weldon ◽

Julia Rachel Trosman ◽

Stephanie A. Cohen ◽

...

Keyword(s):

Breast Cancer ◽

Family History ◽

High Risk ◽

Genetic Test ◽

Breast Conserving Surgery ◽

Test Results ◽

Cancer Genes ◽

National Guidelines ◽

Genetic Test Results ◽

The Impact

1526 Background: Current National Comprehensive Cancer Network guidelines for genetic/familial high-risk assessment state that testing for highly penetrant breast/ovarian cancer genes is clinically indicated for women with early onset (≤ 45 years) or metastatic HER-2 negative breast cancer. A recent Association of Community Cancer Centers (ACCC) survey (N = 95) showed that > 80% of respondents reported ≤ 50% testing rate of patients with breast cancer who met guidelines. Given this disconnect, ACCC partnered with 15 community cancer programs to assess practice gaps and support interventions to improve access to genetic counseling (GC)/testing. Methods: Pre-intervention data from 9/15 partner programs for women diagnosed with stages 0-III breast cancer between 01/01/2017 and 06/30/2019 was collected. De-identified variables included: family history documentation, GC appointment/test results, and timing of results relative to treatment decisions. Results: There were 2691 women with stages 0-III breast cancer. Forty-eight percent (1284/2691) had a documented high-risk family history, 57% (729/1284) of whom had a GC appointment. This was a significantly higher rate of GC compared to the 23% (181/778) of women with no family history and 6% (35/629) of women with no documentation of family history (p < 0.0001). Patients ≤ 45 years old attended a GC appointment 72% (199/278) of the time and 49% (135/278) had genetic test results, with 84% (113/135) receiving results before surgery. For women with test results available before surgery, 37% (119/322) had breast conserving surgery, compared to 60% (144/240) with test results disclosed post-operatively (p < 0.0001). Conclusions: Genetic testing is underutilized in a community cohort of women with breast cancer. Further analysis is needed to understand the impact genetic test results have on surgical decisions. Opportunities exist to improve current rates of appropriate GC/testing. ACCC will share results of quality improvement projects to illuminate which strategies hold promise in reducing the hereditary breast cancer GC/testing practice gap.

Download Full-text

A computational network approach to identify predictive biomarkers and therapeutic combinations for anti-PD-1 immunotherapy in cancer

10.1101/2020.04.25.055616 ◽

2020 ◽

Cited By ~ 1

Author(s):

Chia-Chin Wu ◽

Y Alan Wang ◽

J Andrew Livingston ◽

Jianhua Zhang ◽

P. Andrew Futreal

Keyword(s):

Gene Network ◽

Target Genes ◽

Tumor Response ◽

Cytotoxic T Cells ◽

Predictive Biomarkers ◽

Cancer Genes ◽

Network Approach ◽

Tumor Resistance ◽

Mhc I ◽

Transcriptomic Data

AbstractBackgroundDespite remarkable success, only a subset of cancer patients have shown benefit from the anti-PD1 therapy. Therefore, there is a growing need to identify predictive biomarkers and therapeutic combinations for improving the clinical efficacy.ResultsBased upon the hypothesis that aberrations of any gene that are close to MHC class I genes in the gene network are likely to deregulate MHC I pathway and affect tumor response to anti-PD1, we developed a network approach to infer genes, pathway, and potential therapeutic target genes associated with response to PD-1/PD-L1 checkpoint immunotherapies in cancer. Our approach successfully identified genes (e.g. B2M and PTEN) and pathways (e.g. JAK/STAT and WNT) known to be associated with anti-PD1 response. Our prediction was further validated by 5 CRISPR gene sets associated with tumor resistance to cytotoxic T cells. Our results also showed that many cancer genes that act as hubs in the gene network may drive immune evasion through indirectly deregulating the MHC I pathway. The integration analysis of transcriptomic data of the 34 TCGA cancer types and our prediction reveals that MHC I-immunoregulations may be tissue-specific. The signature-based score, the MHC I association immunoscore (MIAS), calculated by integration of our prediction and TCGA melanoma transcriptomic data also showed a good correlation with patient response to anti-PD1 for 354 melanoma samples complied from 5 cohorts. In addition, most targets of the 36 compounds that have been tested in clinical trials or used for combination treatments with anti-PD1 are in the top list of our prediction (AUC=0.833). Integration of drug target data with our top prediction further identified compounds that were recently shown to enhance tumor response to anti-PD1, such as inhibitors of GSK3B, CDK, and PTK2.ConclusionOur approach is effective to identify candidate genes and pathways associated with response to anti-PD-1 therapy, and can also be employed for in silico screening of potential compounds to enhances the efficacy of anti-PD1 agents against cancer.

Download Full-text

Aging is associated with a systemic length-driven transcriptome imbalance

10.1101/691154 ◽

2019 ◽

Cited By ~ 2

Author(s):

Thomas Stoeger ◽

Rogan A. Grant ◽

Alexandra C. McQuattie-Pimentel ◽

Kishore Anekalla ◽

Sophia S. Liu ◽

...

Keyword(s):

Healthy Aging ◽

Splicing Factor ◽

Transcriptional Elongation ◽

Gene Length ◽

Transcriptomic Data ◽

Human Organs ◽

Common Basis ◽

Expression Of Genes ◽

Age Dependent ◽

Differential Expression Of Genes

AbstractAging manifests itself through a decline in organismal homeostasis and a multitude of cellular and physiological functions1. Efforts to identify a common basis for vertebrate aging face many challenges; for example, while there have been documented changes in the expression of many hundreds of mRNAs, the results across tissues and species have been inconsistent2. We therefore analyzed age-resolved transcriptomic data from 17 mouse organs and 51 human organs using unsupervised machine learning3–5 to identify the architectural and regulatory characteristics most informative on the differential expression of genes with age. We report a hitherto unknown phenomenon, a systemic age-dependent length-driven transcriptome imbalance that for older organisms disrupts the homeostatic balance between short and long transcript molecules for mice, rats, killifishes, and humans. We also demonstrate that in a mouse model of healthy aging, length-driven transcriptome imbalance correlates with changes in expression of splicing factor proline and glutamine rich (Sfpq), which regulates transcriptional elongation according to gene length6. Furthermore, we demonstrate that length-driven transcriptome imbalance can be triggered by environmental hazards and pathogens. Our findings reinforce the picture of aging as a systemic homeostasis breakdown and suggest a promising explanation for why diverse insults affect multiple age-dependent phenotypes in a similar manner.

Download Full-text