Whole genome and RNA sequencing of 1,220 cancers reveals hundreds of genes deregulated by rearrangement of cis-regulatory elements

Mapping Intimacies ◽

10.1101/099861 ◽

2017 ◽

Cited By ~ 3

Author(s):

Yiqun Zhang ◽

Fengju Chen ◽

Nuno A. Fonseca ◽

Yao He ◽

Masashi Fujita ◽

...

Keyword(s):

Gene Expression ◽

Gene Promoter ◽

Regulatory Elements ◽

Whole Genome ◽

Expression Data ◽

Structural Variants ◽

Altered Expression ◽

Genes Expression ◽

Whole Genomes ◽

Pan Cancer

AbstractUsing a dataset of somatic Structural Variants (SVs) in cancers from 2658 patients—1220 with corresponding gene expression data—we identified hundreds of genes for which the nearby presence (within 100kb) of an SV breakpoint was associated with altered expression. For the vast majority of these genes, expression was increased rather than decreased with corresponding SV event. Well-known up-regulated cancer-associated genes impacted by this phenomenon included TERT, MDM2, CDK4, ERBB2, CD274, PDCD1LG2, and IGF2. SVs upstream of TERT involved ~3% of cancer cases and were most frequent in liver-biliary, melanoma, sarcoma, stomach, and kidney cancers. SVs associated with up-regulation of PD1 and PDL1 genes involved ~1% of non-amplified cases. For many genes, SVs were significantly associated with either increased numbers or greater proximity of enhancer regulatory elements near the gene. DNA methylation near the gene promoter was often increased with nearby SV breakpoint, which may involve inactivation of repressor elements.AbbreviationsPCAWGthe Pan-Cancer Analysis of Whole Genomes projectSVStructural Variant

Download Full-text

D-GPM: A Deep Learning Method for Gene Promoter Methylation Inference

Genes ◽

10.3390/genes10100807 ◽

2019 ◽

Vol 10 (10) ◽

pp. 807 ◽

Cited By ~ 1

Author(s):

Pan ◽

Liu ◽

Wen ◽

Liu ◽

Zhang ◽

...

Keyword(s):

Gene Expression ◽

Deep Learning ◽

Promoter Methylation ◽

Methylation Level ◽

Target Genes ◽

Gene Promoter ◽

Support Vector ◽

Whole Genome ◽

Expression Levels ◽

Gene Expression Levels

Whole-genome bisulfite sequencing generates a comprehensive profiling of the gene methylation levels, but is limited by a high cost. Recent studies have partitioned the genes into landmark genes and target genes and suggested that the landmark gene expression levels capture adequate information to reconstruct the target gene expression levels. This inspired us to propose that the methylation level of the promoters in landmark genes might be adequate to reconstruct the promoter methylation level of target genes, which would eventually reduce the cost of promoter methylation profiling. Here, we propose a deep learning model called Deep-Gene Promoter Methylation (D-GPM) to predict the whole-genome promoter methylation level based on the promoter methylation profile of the landmark genes from The Cancer Genome Atlas (TCGA). D-GPM-15%-7000 × 5, the optimal architecture of D-GPM, acquires the least overall mean absolute error (MAE) and the highest overall Pearson correlation coefficient (PCC), with values of 0.0329 and 0.8186, respectively, when testing data. Additionally, the D-GPM outperforms the regression tree (RT), linear regression (LR), and the support vector machine (SVM) in 95.66%, 92.65%, and 85.49% of the target genes by virtue of its relatively lower MAE and in 98.25%, 91.00%, and 81.56% of the target genes based on its relatively higher PCC, respectively. More importantly, the D-GPM predominates in predicting 79.86% and 78.34% of the target genes according to the model distribution of the least MAE and the highest PCC, respectively.

Download Full-text

Integrative Analysis Reveals Comprehensive Altered Metabolic Genes Linking with Tumor Epigenetics Modification in Pan-Cancer

BioMed Research International ◽

10.1155/2019/6706354 ◽

2019 ◽

Vol 2019 ◽

pp. 1-17 ◽

Cited By ~ 1

Author(s):

Yahui Shi ◽

Jinfen Wei ◽

Zixi Chen ◽

Yuchen Yuan ◽

Xingsong Li ◽

...

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Histone Acetylation ◽

Epigenetic Modification ◽

Metabolic Reprogramming ◽

The Cancer Genome Atlas ◽

Altered Expression ◽

Metabolic Genes ◽

Cancer Types ◽

Pan Cancer

Background. Cancer cells undergo various rewiring of metabolism and dysfunction of epigenetic modification to support their biosynthetic needs. Although the major features of metabolic reprogramming have been elucidated, the global metabolic genes linking epigenetics were overlooked in pan-cancer. Objectives. Identifying the critical metabolic signatures with differential expressions which contributes to the epigenetic alternations across cancer types is an urgent issue for providing the potential targets for cancer therapy. Method. The differential gene expression and DNA methylation were analyzed by using the 5726 samples data from the Cancer Genome Atlas (TCGA). Results. Firstly, we analyzed the differential expression of metabolic genes and found that cancer underwent overall metabolism reprogramming, which exhibited a similar expression trend with the data from the Gene Expression Omnibus (GEO) database. Secondly, the regulatory network of histone acetylation and DNA methylation according to altered expression of metabolism genes was summarized in our results. Then, the survival analysis showed that high expression of DNMT3B had a poorer overall survival in 5 cancer types. Integrative altered methylation and expression revealed specific genes influenced by DNMT3B through DNA methylation across cancers. These genes do not overlap across various cancer types and are involved in different function annotations depending on the tissues, which indicated DNMT3B might influence DNA methylation in tissue specificity. Conclusions. Our research clarifies some key metabolic genes, ACLY, SLC2A1, KAT2A, and DNMT3B, which are most disordered and indirectly contribute to the dysfunction of histone acetylation and DNA methylation in cancer. We also found some potential genes in different cancer types influenced by DNMT3B. Our study highlights possible epigenetic disorders resulting from the deregulation of metabolic genes in pan-cancer and provides potential therapy in the clinical treatment of human cancer.

Download Full-text

Discovery of Novel Recurrent Mutations in Childhood Early T-Cell Precursor Acute Lymphoblastic Leukemia by Whole Genome Sequencing - a Report From the St Jude Children's Research Hospital - Washington University Pediatric Cancer Genome Project

Blood ◽

10.1182/blood.v118.21.68.68 ◽

2011 ◽

Vol 118 (21) ◽

pp. 68-68

Author(s):

Jinghui Zhang ◽

Li Ding ◽

Linda Holmfeldt ◽

Gang Wu ◽

Susan L. Heatley ◽

...

Keyword(s):

Gene Expression ◽

Acute Lymphoblastic Leukemia ◽

T Cell ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Lymphoblastic Leukemia ◽

Whole Genome ◽

Structural Variants ◽

Cell Precursor ◽

Recurrent Mutations

Abstract Abstract 68 Early T-cell precursor acute lymphoblastic leukemia (ETP ALL) is characterized by an immature T-lineage immunophenotype (cCD3+, CD1a-, CD8- and CD5dim) aberrant expression of myeloid and stem cell markers, a distinct gene expression profile and very poor outcome. The underlying genetic basis of this form of leukemia is unknown. Here we report results of whole genome sequencing (WGS) of tumor and normal DNA from 12 children with ETP ALL. Genomes were sequenced to 30-fold haploid coverage using the Illumina GAIIx platform, and all putative somatic sequence and structural variants were validated. The frequency of mutations in 43 genes was assessed in a recurrence cohort of 52 ETP and 42 non-ETP T-ALL samples from patients enrolled in St Jude, Children's Oncology Group and AEIOP trials. Transcriptomic resequencing was performed for two WGS cases, and whole exome sequencing for three ETP ALL cases in the recurrence cohort. We identified 44 interchromosomal translocations (mean 4 per patient, range 0–12), 32 intrachromosomal translocations (mean 3, 0–7), 53 deletions (mean 4, 0–10) and 16 insertions (mean 1, 0–5). Three cases exhibited a pattern of complex rearrangements suggestive of a single cellular catastrophe (“chromothripsis”), two of which had mutations targeting mismatch and DNA repair (MLH3 and DCLRE1C). While no single chromosomal alteration was present in all cases, 10 of 12 ETP ALLs harbored chromosomal rearrangements, several of which involved complex multichromosomal translocations and resulted in the expression of chimeric in-frame novel fusion genes disrupting hematopoietic regulators, including ETV6-INO80D, NAP1L1-MLLT10, RUNX1-EVX1 and NUP214-SQSTM1, each occurring in a single case. An additional ETP case with the ETV6-INO80D fusion was identified in the recurrence cohort. Additionally, 51% of structural variants had breakpoints in genes, including those with roles in hematopoiesis and leukemogenesis, and genes also targeted by mutation in other cases (MLH3, SUZ12, RUNX1). We identified a high frequency of activating mutations in genes regulating cytokine receptor and Ras signalling in ETP ALL (67.2% of ETP compared to 19% of non-ETP T-ALL) including NRAS (17%), FLT3 (14%), JAK3 (9%), SH2B3 (or LNK; 9%), IL7R (8%), JAK1 (8%), KRAS (3%), and BRAF (2%). Seven cases (5 ETP, 2 non-ETP) harbored in frame insertion mutations in the transmembrane domain of IL7R, which were transforming when expressed in the murine cell lines, and resulted in enhanced colony formation when expressed in primary murine hematopoietic cells. The IL7R mutations resulted in constitutive Jak-Stat activation in these cell lines and primary leukemic cells expressing these mutations. Fifty-eight percent of ETP cases (compared to 17% of non-ETP cases) harbored mutations known or predicted to disrupt hematopoietic and lymphoid development, including ETV6 (33%), RUNX1 (16%), IKZF1 (14%), GATA3 (10%), EP300 (5%) and GATA2 (2%). GATA3 regulates early T cell development, and mutations in this gene were observed exclusively in ETP ALL. The mutations were commonly biallelic, and were clustered at R276, a residue critical for binding of GATA3 to DNA. Strikingly, mutations disrupting chromatin modifying genes were also highly enriched in ETP ALL. Genes encoding the the polycomb repressor complex 2 (EZH2, SUZ12 and EED), that mediates histone 3 lysine 27 (H3K27) trimethylation were deleted or mutated in 42% of ETP ALL compared to 12% of non-ETP T-ALL. In addition, alterations of the H3K36 trimethylase SETD2 were observed in 5 ETP cases, but not in non-ETP ALL. We also identified recurrent mutations in genes that have not previously been implicated in hematopoietic malignancies including RELN, DNM2, ECT2L, HNRNPA1 and HNRNPR. Using gene set enrichment analysis we demonstrate that the gene expression profile of ETP ALL shares features not only with normal human hematopoietic stem cells, but also with leukemic initiating cells (LIC) purified from patients with acute myeloid leukemia (AML). These results indicate that mutations that drive proliferation, impair differentiation and disrupt histone modification cooperate to induce an aggressive leukemia with an aberrant immature phenotype. The similarity of the gene expression pattern with that observed in the LIC of AML raises the possibility that myeloid-directed therapies might improve the outcome of ETP ALL. Disclosures: Evans: St. Jude Children's research Hospital: Employment, Patents & Royalties; NIH & NCI: Research Funding; Aldagen: Membership on an entity's Board of Directors or advisory committees.

Download Full-text

Linked-read whole-genome sequencing resolves common and private structural variants in multiple myeloma

10.1101/2021.12.09.471893 ◽

2021 ◽

Author(s):

Lucía Peña Pérez ◽

Nicolai Frengen ◽

Julia Hauenstein ◽

Charlotte Gran ◽

Charlotte Gustafsson ◽

...

Keyword(s):

Multiple Myeloma ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Cohort Analysis ◽

Genomic Medicine ◽

Molecular Classification ◽

Regulatory Elements ◽

Copy Number Variations ◽

Whole Genome ◽

Structural Variants

Multiple myeloma (MM) is an incurable and aggressive plasma cell malignancy characterized by a complex karyotype with multiple structural variants (SVs) and copy number variations (CNVs). Linked-read whole-genome sequencing (lrWGS) allows for refined detection and reconstruction of SVs by providing long-range genetic information from standard short-read sequencing. This makes lrWGS an attractive solution for capturing the full genomic complexity of MM. Here we show that high-quality lrWGS data can be generated from low numbers of FACS sorted cells without DNA purification. Using this protocol, we analyzed FACS sorted MM cells from 37 MM patients with lrWGS. We found high concordance between lrWGS and FISH for the detection of recurrent translocations and CNVs. Outside of the regions investigated by FISH, we identified >150 additional SVs and CNVs across the cohort. Analysis of the lrWGS data allowed for resolving the structure of diverse SVs affecting the MYC and t(11;14) loci causing the duplication of genes and gene regulatory elements. In addition, we identified private SVs causing the dysregulation of genes recurrently involved in translocations with the IGH locus and show that these can alter the molecular classification of the MM. Overall, we conclude that lrWGS allows for the detection of aberrations critical for MM prognostics and provides a feasible route for providing comprehensive genetics. Implementing lrWGS could provide more accurate clinical prognostics, facilitate genomic medicine initiatives, and greatly improve the stratification of patients included in clinical trials.

Download Full-text

Whole genome sequencing identifies common and rare structural variants contributing to hematologic traits in the NHLBI TOPMed program

10.1101/2021.12.16.21267871 ◽

2021 ◽

Author(s):

Marsha M. Wheeler ◽

Adrienne M Stilp ◽

Shuquan Rao ◽

Bjarni V Halldorsson ◽

Doruk V Beyter ◽

...

Keyword(s):

Blood Cell ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Association Studies ◽

Regulatory Elements ◽

Chromatin Domain ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Structural Variants ◽

Genome Wide

Genome-wide association studies (GWAS) have identified thousands of single nucleotide variants and small indels that contribute to the genetic architecture of hematologic traits. While structural variants (SVs) are known to cause rare blood or hematopoietic disorders, the genome-wide contribution of SVs to quantitative blood cell trait variation is unknown. Here we utilized SVs detected from whole genome sequencing (WGS) in ancestrally diverse participants of the NHLBI TOPMed program (N=50,675). Using single variant tests, we assessed the association of common and rare SVs with red cell-, white cell-, and platelet-related quantitative traits. The results show 33 independent SVs (23 common and 10 rare) reaching genome-wide significance. The majority of significant association signals (N=27) replicated in independent datasets from deCODE genetics and the UK BioBank. Moreover, most trait-associated SVs (N=24) are within 1Mb of previously-reported GWAS loci. SV analyses additionally discovered an association between a complex structural variant on 17p11.2 and white blood cell-related phenotypes. Based on functional annotation, the majority of significant SVs are located in non-coding regions (N=26) and predicted to impact regulatory elements and/or local chromatin domain boundaries in blood cells. We predict that several trait-associated SVs represent the causal variant. This is supported by genome-editing experiments which provide evidence that a deletion associated with lower monocyte counts leads to disruption of an S1PR3 monocyte enhancer and decreased S1PR3 expression.

Download Full-text

Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data

F1000Research ◽

10.12688/f1000research.10529.1 ◽

2016 ◽

Vol 5 ◽

pp. 2927 ◽

Cited By ~ 9

Author(s):

Linh Nguyen ◽

Cuong C Dang ◽

Pedro J. Ballester

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Cell Line ◽

Cell Lines ◽

Gene Expression Data ◽

Single Gene ◽

Cancer Cell Line ◽

Expression Data ◽

Gene Markers ◽

Pan Cancer

Background:Selected gene mutations are routinely used to guide the selection of cancer drugs for a given patient tumour. Large pharmacogenomic data sets were introduced to discover more of these single-gene markers of drug sensitivity. Very recently, machine learning regression has been used to investigate how well cancer cell line sensitivity to drugs is predicted depending on the type of molecular profile. The latter has revealed that gene expression data is the most predictive profile in the pan-cancer setting. However, no study to date has exploited GDSC data to systematically compare the performance of machine learning models based on multi-gene expression data against that of widely-used single-gene markers based on genomics data.Methods:Here we present this systematic comparison using Random Forest (RF) classifiers exploiting the expression levels of 13,321 genes and an average of 501 tested cell lines per drug. To account for time-dependent batch effects in IC50measurements, we employ independent test sets generated with more recent GDSC data than that used to train the predictors and show that this is a more realistic validation than K-fold cross-validation.Results and Discussion:Across 127 GDSC drugs, our results show that the single-gene markers unveiled by the MANOVA analysis tend to achieve higher precision than these RF-based multi-gene models, at the cost of generally having a poor recall (i.e. correctly detecting only a small part of the cell lines sensitive to the drug). Regarding overall classification performance, about two thirds of the drugs are better predicted by multi-gene RF classifiers. Among the drugs with the most predictive of these models, we found pyrimethamine, sunitinib and 17-AAG.Conclusions:We now know that this type of models can predictin vitrotumour response to these drugs. These models can thus be further investigated onin vivotumour models.

Download Full-text

The B chromosome of Pseudococcus viburni: a selfish chromosome that exploits whole-genome meiotic drive

10.1101/2021.08.30.458195 ◽

2021 ◽

Author(s):

Isabelle M. Vea ◽

Andrés G. de la Filia ◽

Kamil S. Jaron ◽

Andrew Joseph Mongue ◽

Fransico J. Ruiz-Ruano ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Meiotic Drive ◽

B Chromosome ◽

Male Meiosis ◽

B Chromosomes ◽

Whole Genome ◽

Expression Data ◽

Fair Process ◽

Pseudococcus Viburni

Meiosis, the key process underlying sexual reproduction, is generally a fair process: each chromosome has a 50% chance of being included into each gamete. However in some organisms meiosis has become highly aberrant with some chromosomes having a higher chance of making it into gametes than others. Yet why and how such systems evolve remains unclear. Here we study the unusual reproductive genetics of mealybugs, in which only maternal-origin chromosomes are included into the gametes during male meiosis, while paternally-derived chromosomes degrade. This whole genome meiotic drive occurs in all males and is evolutionarily conserved. However one species - the obscure mealybug Pseudococcus viburni - has a segregating B chromosome that increases in frequency by escaping paternal genome elimination. Here we present whole-genome and gene expression data from laboratory lines with and without B chromosomes. These data allow us to identify B-linked sequences including >70 protein-coding genes as well as a B-specific satellite repeat that makes up a significant proportion of the chromosome. We also used these data to investigate the evolutionary origin of the B chromosome. The few paralogs between the B and the core genome are distributed throughout the genome, showing that it is unlikely that the B originated through a simple duplication of one of the autosomes. We also find that while many of the B-linked genes do not have paralogs within the P.viburni genome, but they do show orthology with genes in other hemipteran insects suggesting that the B might have originated from fission of one of the autosomes, possibly followed by further translocations of individual genes. Finally in order to understand the mechanisms by which the B is able to escape elimination when paternally-derived we generated gene expression data for males and females with and without B chromosomes. We find that at the developmental stage when meiosis is taking place only a small number of B-linked genes show significant expression. Only one gene was significantly over-expressed during male meiosis, which is when the drive occurs: a acetyltransferase involved in H3K56Ac, which has a putative role in meiosis and is therefore a promising candidate for further studies. Together, these results form a promising foundation for studying the mechanisms of meiotic drive in a system that is uniquely suited for this approach.

Download Full-text

Predicting master transcription factors from pan-cancer expression data

10.1101/839142 ◽

2019 ◽

Cited By ~ 4

Author(s):

Jessica Reddy ◽

Marcos A. S. Fonseca ◽

Rosario I Corona ◽

Robbin Nameki ◽

Felipe Segato Dezem ◽

...

Keyword(s):

Transcription Factor ◽

Transcription Factors ◽

Regulatory Elements ◽

Ca 125 ◽

Tumor Type ◽

Expression Data ◽

Primary Tumors ◽

Cancer Types ◽

Tumor Types ◽

Pan Cancer

The function of critical developmental regulators can be subverted by cancer cells to control expression of oncogenic transcriptional programs. These "master transcription factors" (MTFs) are often essential for cancer cell survival and represent vulnerabilities that can be exploited therapeutically. The current approaches to identify candidate MTFs examine super-enhancer associated transcription factor-encoding genes with high connectivity in network models. This relies on chromatin immunoprecipitation-sequencing (ChIP-seq) data, which is technically challenging to obtain from primary tumors, and is currently unavailable for many cancer types and clinically relevant subtypes. In contrast, gene expression data are more widely available, especially for rare tumors and subtypes where MTFs have yet to be discovered. We have developed a predictive algorithm called CaCTS (Cancer Core Transcription factor Specificity) to identify candidate MTFs using pan-cancer RNA-sequencing data from The Cancer Genome Atlas. The algorithm identified 273 candidate MTFs across 34 tumor types and recovered known tumor MTFs. We also made novel predictions, including for cancer types and subtypes for which MTFs have not yet been characterized. Clustering based on MTF predictions reproduced anatomic groupings of tumors that share 1-2 lineage-specific candidates, but also dictated functional groupings, such as a squamous group that comprised five tumor subtypes sharing 3 common MTFs. PAX8, SOX17, and MECOM were candidate factors in high-grade serous ovarian cancer (HGSOC), an aggressive tumor type where the core regulatory circuit is currently uncharacterized. PAX8, SOX17, and MECOM are required for cell viability and lie proximal to super-enhancers in HGSOC cells. ChIP-seq revealed that these factors co-occupy HGSOC regulatory elements globally and co-bind at critical gene loci including MUC16 (CA-125). Addiction to these factors was confirmed in studies using THZ1 to inhibit transcription in HGSOC cells, suggesting early down-regulation of these genes may be responsible for cytotoxic effects of THZ1 on HGSOC models. Identification of MTFs across 34 tumor types and 140 subtypes, especially for those with limited understanding of transcriptional drivers paves the way to therapeutic targeting of MTFs in a broad spectrum of cancers.

Download Full-text

Assessing the Gene Regulatory Landscape in 1,188 Human Tumors

10.1101/225441 ◽

2017 ◽

Cited By ~ 4

Author(s):

C Calabrese ◽

K Lehmann ◽

L Urban ◽

F Liu ◽

S Erkek ◽

...

Keyword(s):

Gene Expression ◽

Genetic Variation ◽

Large Scale ◽

Human Cancer ◽

Expression Patterns ◽

Regulatory Elements ◽

Whole Genome ◽

Somatic Variation ◽

Specific Expression ◽

Large Scale Assessment

AbstractCancer is characterised by somatic genetic variation, but the effect of the majority of non-coding somatic variants and the interface with the germline genome are still unknown. We analysed the whole genome and RNA-Seq data from 1,188 human cancer patients as provided by the Pan-cancer Analysis of Whole Genomes (PCAWG) project to map cis expression quantitative trait loci of somatic and germline variation and to uncover the causes of allele-specific expression patterns in human cancers. The availability of the first large-scale dataset with both whole genome and gene expression data enabled us to uncover the effects of the non-coding variation on cancer. In addition to confirming known regulatory effects, we identified novel associations between somatic variation and expression dysregulation, in particular in distal regulatory elements. Finally, we uncovered links between somatic mutational signatures and gene expression changes, including TERT and LMO2, and we explained the inherited risk factors in APOBEC-related mutational processes. This work represents the first large-scale assessment of the effects of both germline and somatic genetic variation on gene expression in cancer and creates a valuable resource cataloguing these effects.

Download Full-text

Abstract 5104: Pan-cancer classification on gene expression data by neural network

10.1158/1538-7445.am2019-5104 ◽

2019 ◽

Author(s):

Kijin Yu ◽

Bong-Hyun Kim ◽

Peter Chang Whan Lee

Keyword(s):

Neural Network ◽

Gene Expression ◽

Gene Expression Data ◽

Cancer Classification ◽

Expression Data ◽

Pan Cancer

Download Full-text