Chromatin-informed inference of transcriptional programs in gynecologic and basal breast cancers

Mapping Intimacies ◽

10.1101/333757 ◽

2018 ◽

Cited By ~ 3

Author(s):

Hatice U. Osmanbeyoglu ◽

Fumiko Shimizu ◽

Angela Rynne-Vidal ◽

Petar Jelinic ◽

Samuel C. Mok ◽

...

Keyword(s):

Transcription Factor ◽

Cell Line ◽

Chromatin Accessibility ◽

Patient Specific ◽

Breast Cancer Cell Lines ◽

Expression Data ◽

Rna Seq ◽

Large Tumor ◽

Data Set ◽

Basal Breast Cancer

ABSTRACTEpigenomic data on transcription factor occupancy and chromatin accessibility can elucidate the developmental origin of cancer cells and reveal the enhancer landscape of key oncogenic transcriptional regulators. However, in many cancers, epigenomic analyses have been limited, and computational methods to infer regulatory networks in tumors typically use expression data alone, or rely on transcription factor (TF) motifs in annotated promoter regions. Here, we develop a novel machine learning strategy called PSIONIC (patient-specific inference of networks informed by chromatin) to combine cell line chromatin accessibility data with large tumor expression data sets and model the effect of enhancers on transcriptional programs in multiple cancers. We generated a new ATAC-seq data set profiling chromatin accessibility in gynecologic and basal breast cancer cell lines and applied PSIONIC to 723 RNA-seq experiments from ovarian, uterine, and basal breast tumors as well as 96 cell line RNA-seq profiles. Our computational framework enables us to share information across tumors to learn patient-specific inferred TF activities, revealing regulatory differences between and within tumor types. Many of the identified TF regulators were significantly associated with survival outcome in basal breast, uterine serous and endometrioid carcinomas. Moreover, PSIONIC-predicted activity for MTF1 in cell line models correlated with sensitivity to MTF1 inhibition. Therefore computationally dissecting the role of TFs in gynecologic cancers may ultimately advance personalized therapy.

Download Full-text

Chromatin-informed inference of transcriptional programs in gynecologic and basal breast cancers

Nature Communications ◽

10.1038/s41467-019-12291-6 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 5

Author(s):

Hatice U. Osmanbeyoglu ◽

Fumiko Shimizu ◽

Angela Rynne-Vidal ◽

Direna Alonso-Curbelo ◽

Hsuan-An Chen ◽

...

Keyword(s):

Cell Line ◽

Expression Patterns ◽

Chromatin Accessibility ◽

Patient Specific ◽

Breast Cancer Cell Lines ◽

Breast Cancers ◽

Large Tumor ◽

Basal Breast Cancer ◽

Tumor Expression ◽

Tumor Types

Abstract Chromatin accessibility data can elucidate the developmental origin of cancer cells and reveal the enhancer landscape of key oncogenic transcriptional regulators. We develop a computational strategy called PSIONIC (patient-specific inference of networks informed by chromatin) to combine chromatin accessibility data with large tumor expression data and model the effect of enhancers on transcriptional programs in multiple cancers. We generate a new ATAC-seq data profiling chromatin accessibility in gynecologic and basal breast cancer cell lines and apply PSIONIC to 723 patient and 96 cell line RNA-seq profiles from ovarian, uterine, and basal breast cancers. Our computational framework enables us to share information across tumors to learn patient-specific TF activities, revealing regulatory differences between and within tumor types. PSIONIC-predicted activity for MTF1 in cell line models correlates with sensitivity to MTF1 inhibition, showing the potential of our approach for personalized therapy. Many identified TFs are significantly associated with survival outcome. To validate PSIONIC-derived prognostic TFs, we perform immunohistochemical analyses in 31 uterine serous tumors for ETV6 and 45 basal breast tumors for MITF and confirm that the corresponding protein expression patterns are also significantly associated with prognosis.

Download Full-text

Quantification of differential transcription factor activity and multiomics-based classification into activators and repressors:diffTF

10.1101/368498 ◽

2018 ◽

Cited By ~ 3

Author(s):

Ivan Berest ◽

Christian Arnold ◽

Armando Reyes-Palomares ◽

Giovanni Palla ◽

Kasper Dindler Rasmussen ◽

...

Keyword(s):

Transcription Factor ◽

Case Studies ◽

Cell Types ◽

Chromatin Accessibility ◽

Lymphocytic Leukemia ◽

Transcriptional Activators ◽

Rna Seq ◽

Data Set ◽

Genome Wide

Transcription factor (TF) activity is an important read-out of cellular signalling pathways and thus to assess regulatory differences across conditions. However, current technologies lack the ability to simultaneously assess activity changes for multiple TFs and in particular to determine whether a specific TF acts globally as transcriptional repressor or activator. To this end, we introduce a widely applicable genome-wide methoddiffTFto assess differential TF activity and to classify TFs as activator or repressor (available athttps://git.embl.de/grp-zaugg/diffTF). This is done by integrating any type of genome-wide chromatin accessibility data with RNA-Seq data and in-silico predicted TF binding sites. We corroborated the classification of TFs into repressors and activators by three independent analyses based on enrichments of active/repressive chromatin states, correlation of TF activity with gene expression, and activator-and repressor-specific chromatin footprints. To show the power ofdiffTF, we present two case studies: First, we applieddiffTFin to a large ATAC-Seq/RNA-Seq dataset comparing mutated and unmutated chronic lymphocytic leukemia samples, where we identified dozens of known (40%) and potentially novel (60%) TFs that are differentially active. We were also able to classify almost half of them as either repressor and activator. Second, we applieddiffTFto a small ATAC-Seq/RNA-Seq data set comparing two cell types along the hematopoietic differentiation trajectory (multipotent progenitors – MPP – versus granulocyte-macrophage progenitors – GMP). Here we identified the known drivers of differentiation and found that the majority of the differentially active TFs are transcriptional activators. Overall,diffTFwas able to recover the known TFs in both case studies, additionally identified TFs that have been less well characterized in the given condition, and provides a classification of the TFs into transcriptional activators and repressors.

Download Full-text

Predicting transcription factor binding motifs from DNA-binding domains, chromatin accessibility and gene expression data

Nucleic Acids Research ◽

10.1093/nar/gkx358 ◽

2017 ◽

Vol 45 (10) ◽

pp. 5666-5677 ◽

Cited By ~ 5

Author(s):

Mahdi Zamanighomi ◽

Zhixiang Lin ◽

Yong Wang ◽

Rui Jiang ◽

Wing Hung Wong

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Dna Binding ◽

Gene Expression Data ◽

Chromatin Accessibility ◽

Expression Data ◽

Binding Motifs ◽

Dna Binding Domains ◽

Binding Domains ◽

Transcription Factor Binding Motifs

Download Full-text

SAT-298 Integrative Single-Cell Transcriptomic and Epigenomic Landscape of Mouse Anterior Pituitary Cell Types

Journal of the Endocrine Society ◽

10.1210/jendso/bvaa046.593 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

Author(s):

Frederique Murielle Ruf-Zamojski ◽

Michel A Zamojski ◽

German Nudelman ◽

Yongchao Ge ◽

Natalia Mendelev ◽

...

Keyword(s):

Single Cell ◽

Cell Line ◽

Anterior Pituitary ◽

Cell Types ◽

Chromatin Accessibility ◽

Pituitary Cell ◽

Integrated Analysis ◽

Pituitary Cells ◽

Rna Seq ◽

Cell Type

Abstract The pituitary gland is a critical regulator of the neuroendocrine system. To further our understanding of the classification, cellular heterogeneity, and regulatory landscape of pituitary cell types, we performed and computationally integrated single cell (SC)/single nucleus (SN) resolution experiments capturing RNA expression, chromatin accessibility, and DNA methylation state from mouse dissociated whole pituitaries. Both SC and SN transcriptome analysis and promoter accessibility identified the five classical hormone-producing cell types (somatotropes, gonadotropes (GT), lactotropes, thyrotropes, and corticotropes). GT cells distinctively expressed transcripts for Cga, Fshb, Lhb, Nr5a1, and Gnrhr in SC RNA-seq and SN RNA-seq. This was matched in SN ATAC-seq with GTs specifically showing open chromatin at the promoter regions for the same genes. Similarly, the other classically defined anterior pituitary cells displayed transcript expression and chromatin accessibility patterns characteristic of their own cell type. This integrated analysis identified additional cell-types, such as a stem cell cluster expressing transcripts for Sox2, Sox9, Mia, and Rbpms, and a broadly accessible chromatin state. In addition, we performed bulk ATAC-seq in the LβT2b gonadotrope-like cell line. While the FSHB promoter region was closed in the cell line, we identified a region upstream of Fshb that became accessible by the synergistic actions of GnRH and activin A, and that corresponded to a conserved region identified by a polycystic ovary syndrome (PCOS) single nucleotide polymorphism (SNP). Although this locus appears closed in deep sequencing bulk ATAC-seq of dissociated mouse pituitary cells, SN ATAC-seq of the same preparation showed that this site was specifically open in mouse GT, but closed in 14 other pituitary cell type clusters. This discrepancy highlighted the detection limit of a bulk ATAC-seq experiment in a subpopulation, as GT represented ~5% of this dissociated anterior pituitary sample. These results identified this locus as a candidate for explaining the dual dependence of Fshb expression on GnRH and activin/TGFβ signaling, and potential new evidence for upstream regulation of Fshb. The pituitary epigenetic landscape provides a resource for improved cell type identification and for the investigation of the regulatory mechanisms driving cell-to-cell heterogeneity. Additional authors not listed due to abstract submission restrictions: N. Seenarine, M. Amper, N. Jain (ISMMS).

Download Full-text

Genome-wide association between transcription factor expression and chromatin accessibility reveals chromatin state regulators

10.1101/043414 ◽

2016 ◽

Author(s):

David Felix Lamparter ◽

Daniel Marbach ◽

Rico Rueedi ◽

Sven Bergmann ◽

Zoltan Kutalik

Keyword(s):

Transcription Factor ◽

Transcription Factors ◽

Chromatin Accessibility ◽

Binding Motif ◽

Open Chromatin ◽

Data Sets ◽

Transcription Factor Binding Motif ◽

Data Set ◽

Data Driven Approach ◽

Transcription Factor Expression

To better understand genome regulation, it is important to uncover the role of transcription factors in the process of chromatin structure establishment and maintenance. Here we present a data-driven approach to systematically characterize transcription factors that are relevant for this process. Our method uses a linear mixed modeling approach to combine data sets of transcription factor binding motif enrichments in open chromatin and gene expression across the same set of cell lines. Applying this approach to the ENCODE data set we confirm already known and imply numerous novel transcription factors in playing a role in the establishment or maintenance of open chromatin.

Download Full-text

DOP23 Single-cell RNA sequencing identifies an important role for class I histone-deacetylase enzymes in intestinal myofibroblasts from patients with Crohn’s Disease strictures

Journal of Crohn s and Colitis ◽

10.1093/ecco-jcc/jjab073.062 ◽

2021 ◽

Vol 15 (Supplement_1) ◽

pp. S062-S062

Author(s):

A Lewis ◽

B Pan-Castillo ◽

G Berti ◽

C Felice ◽

H Gordon ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Line ◽

Rna Sequencing ◽

Histone Deacetylase ◽

Chromatin Accessibility ◽

Collagen I ◽

Class I ◽

Rna Seq ◽

Single Cell Rna Sequencing

Abstract Background Histone-deacetylase (HDAC) enzymes are a broad class of ubiquitously expressed enzymes that modulate histone acetylation, chromatin accessibility and gene expression. In models of Inflammatory bowel disease (IBD), HDAC inhibitors, such as Valproic acid (VPA) are proven anti-inflammatory agents and evidence suggests that they also inhibit fibrosis in non-intestinal organs. However, the role of HDAC enzymes in stricturing Crohn’s disease (CD) has not been characterised; this is key to understanding the molecular mechanism and developing novel therapies. Methods To evaluate HDAC expression in the intestine of SCD patients, we performed unbiased single-cell RNA sequencing (sc-RNA-seq) of over 10,000 cells isolated from full-thickness surgical resection specimens of non-SCD (NSCD; n=2) and SCD intestine (n=3). Approximately, 1000 fibroblasts were identified for further analysis, including a distinct cluster of myofibroblasts. Changes in gene expression were compared between myofibroblasts and other resident intestinal fibroblasts using the sc-RNA-seq analysis pipeline in Partek. Changes in HDAC expression and markers of HDAC activity (H3K27ac) were confirmed by immunohistochemistry in FFPE tissue from patient matched NSCD and SCD intestine (n=14 pairs). The function of HDACs in intestinal fibroblasts in the CCD-18co cell line and primary CD myofibroblast cultures (n=16 cultures) was assessed using VPA, a class I HDAC inhibitor. Cells were analysed using a variety of molecular techniques including ATAC-seq, gene expression arrays, qPCR, western blot and immunofluorescent protein analysis. Results Class I HDAC (HDAC1, p= 2.11E-11; HDAC2, p= 4.28E-11; HDAC3, p= 1.60E-07; and HDAC8, p= 2.67E-03) expression was increased in myofibroblasts compared to other intestinal fibroblasts subtypes. IHC also showed an increase in the percentage of stromal HDAC2 positive cells, coupled with a decrease in the percentage of H3K27ac positive cells, in the mucosa overlying SCD intestine relative to matched NSCD areas. In the CCD-18co cell line and primary myofibroblast cultures, VPA reduced chromatin accessibility at Collagen-I gene promoters and suppressed their transcription. VPA also inhibited TGFB-induced up-regulation of Collagen-I, in part by inhibiting TGFB1|1/SMAD4 signalling. TGFB1|1 was identified as a mesenchymal specific target of VPA and siRNA knockdown of TGFB1|1 was sufficient suppress TGFB-induced up-regulation of Collagen-I. Conclusion In SCD patients, class I HDAC expression is increased in myofibroblasts. Class I HDACs inhibitors impair TGFB-signalling and inhibit Collagen-I expression. Selective targeting of TGFB1|1 offers the opportunity to increase treatment specificity by selectively targeting meschenymal cells.

Download Full-text

Deep Mining of Natural Genetic Variation in Erythroid Cells Reveals New Insights about In Vivo Transcription Factor Binding and Chromatin Accessibility

Blood ◽

10.1182/blood.v128.22.3879.3879 ◽

2016 ◽

Vol 128 (22) ◽

pp. 3879-3879

Author(s):

Vivek Behera ◽

Perry Evans ◽

Carolyne J Face ◽

Laavanya Sankaranarayanan ◽

Gerd A. Blobel

Keyword(s):

Transcription Factor ◽

Genetic Variation ◽

Transcription Factors ◽

Cell Line ◽

Cell Lines ◽

Genetic Variants ◽

Binding Sites ◽

Chromatin Accessibility ◽

Erythroid Cell

Abstract Erythroid transcription factors (TFs) control gene expression programs, lineage decisions, and disease outcomes. How transcription factors contact DNA has been studied extensively in vitro, but in vivo binding characteristics are less well understood as they are influenced in a reciprocal manner by chromatin accessibility and neighboring transcription factors. Here, we present a comparative analysis approach that takes advantage of non-coding sequence variation between functionally equivalent erythroid cell lines to conduct an in-depth analysis of erythroid TF binding profiles and chromatin features. Specifically, we analyzed ChIP-seq datasets to identify millions of genetic non-coding variants between the mouse erythroleukemia cell line (MEL), a GATA1-inducible erythroid progenitor cell line (G1E-ER4), and primary murine erythroblast cells. We found that while these cell lines are highly positively correlated in chromatin features, larger differences in TF binding intensity are correlated with higher degrees of genetic variation between cell lines. We next examined discriminatory genetic variants between the cell lines that are located in ChIP-seq peaks of the erythroid transcription factor GATA1. Hundreds of such variants fall within GATA1 motifs. Differential GATA1 binding intensities associated with the variants revealed nucleotide positions that contribute most to in vivo GATA1 chromatin occupancy and identified which alternative nucleotides are most likely to disrupt binding. Notably, this additional information about GATA1's in vivo nucleotide binding preferences improved prediction of GATA1 binding sites genome-wide. We applied similar approaches to determine the bp-resolution in vivo binding preferences of TAL1/SCL and CTCF. We additionally identified thousands of discriminatory genetic variants within GATA1 sites that fall outside canonical GATA elements but within binding sites of other known TFs. Association of these variants with differential GATA1 binding intensities revealed that the hematopoietic transcription factors TAL1/SCL and KLF1 positively regulate GATA1 chromatin occupancy. Strikingly, we identified a number of motifs not previously implicated in cooperating with GATA1 that positively impact GATA1 chromatin binding. Notably, we also defined motifs associated with negative regulation of GATA1 chromatin occupancy. Applying a similar analysis to TAL1/SCL and CTCF revealed additional motifs involved in regulating the chromatin occupancy of these TFs. Finally, we associated discriminatory genetic variation between erythroid cell lines with large changes in sub-kb-scale DNase hypersensitivity. We found that single base pair substitutions within or near a number of erythroid TF motifs, including that for the RUNX family of nuclear factors, are strongly associated with changes in chromatin accessibility. Our findings use novel methods in comparative ChIP-seq and DNase-seq analysis to reveal new insights about the genetic basis for erythroid TF chromatin occupancy and chromatin accessibility. Disclosures No relevant conflicts of interest to declare.

Download Full-text

Coordinated analysis of exon and intron data reveals novel differential gene expression changes

Scientific Reports ◽

10.1038/s41598-020-72482-w ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Hamid R. Eghbalnia ◽

William W. Wilfinger ◽

Karol Mackey ◽

Piotr Chomczynski

Keyword(s):

Gene Networks ◽

Expression Data ◽

Rna Seq ◽

Data Set ◽

Exon Expression ◽

Regulatory Processes ◽

Pathway Gene ◽

Immune Pathway ◽

Archived Data

Abstract RNA-Seq expression analysis currently relies primarily upon exon expression data. The recognized role of introns during translation, and the presence of substantial RNA-Seq counts attributable to introns, provide the rationale for the simultaneous consideration of both exon and intron data. We describe here a method for the coordinated analysis of exon and intron data by investigating their relationship within individual genes and across samples, while taking into account changes in both variability and expression level. This coordinated analysis of exon and intron data offers strong evidence for significant differences that distinguish the profiles of the exon-only expression data from the combined exon and intron data. One advantage of our proposed method, called matched change characterization for exons and introns (MEI), is its straightforward applicability to existing archived data using small modifications to standard RNA-Seq pipelines. Using MEI, we demonstrate that when data are examined for changes in variability across control and case conditions, novel differential changes can be detected. Notably, when MEI criteria were employed in the analysis of an archived data set involving polyarthritic subjects, the number of differentially expressed genes was expanded by sevenfold. More importantly, the observed changes in exon and intron variability with statistically significant false discovery rates could be traced to specific immune pathway gene networks. The application of MEI analysis provides a strategy for incorporating the significance of exon and intron variability and further developing the role of using both exons and intron sequencing counts in studies of gene regulatory processes.

Download Full-text

Gene fusions in glioblastoma: Results of Gliocat project.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.2042 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. 2042-2042

Author(s):

Ainhoa Hernandez Gonzalez ◽

Anna Esteve-Codina ◽

Cristina Carrato ◽

Ana Munoz ◽

Estela Pineda ◽

...

Keyword(s):

Gene Fusion ◽

Molecular Subtype ◽

Malignant Gliomas ◽

Low Frequency ◽

Large Data ◽

Patient Specific ◽

Gene Fusions ◽

Rna Seq ◽

Data Set ◽

Novel Treatments

2042 Background: Malignant gliomas are heterogeneous diseases in genetic basis. The development of sequencing techniques, such as RNA-Sequencing, has identified many gene rearrangements encoding novel oncogenic fusions. Gene fusion discovery can potentially lead to the development of novel treatments, however studies of gene fusions in glioma remain limited. Methods: The GLIOCAT project studied 139 patient samples of newly diagnosed glioblastoma who had received the standard first-line treatment from 2004 to 2015, to identify gene fusion events from glioblastoma transcriptome data (RNA-Seq). The molecular subtype could be studied in 124 cases. RNA-Seq reads were mapped against the reference human genome with STAR-fusion version 0.7.0, specifically, with FusionInspector validate ( http://star-fusion.github.io ). Two other platforms, FusionHub ( https://fusionhub.persistent.co.in ) and Oncofuse ( www.unav.es/genetica/oncofuse.html ), were applied to eliminate false positives or previously described in healthy tissue and to predict of the oncogenic potential each fusion. Results: A total of61 patients showed 103 different fusions, a median of two fusions by sample. The majority of gene fusions were intrachromosomal and most frequently implied chromosome was 12 followed by 7. In addition, fusions were more common in patients with MGMT promoter methylation, TCGA classical subtype and 18 IGS subtype. There were no differences in age, sex, type of surgery or long survivors ( > 30 months). Ten fusions were already described in cancer, including three in gliomas (FRS2-KIF5A, EGFR-SEPT14 and FGFR3-TACC3). From the detected fusions, 22 of them included an oncogene or protooncogene. Conclusions: In our study, we report the landscape of gene fusions from a large data set of glioblastomas analyzed by RNA-seq. The majority of the fusions were private fusions. A minority of these recur in a low frequency but as many as a quarter of them included an oncogene or protooncogene. RNA-seq of GBM patient samples it is an important tool for the identification of patient-specific fusions that could drive personalized therapy. Furtherless, we will plan to validate this gene fusions.

Download Full-text

Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome

Nucleic Acids Research ◽

10.1093/nar/gkt1300 ◽

2013 ◽

Vol 42 (5) ◽

pp. 2820-2832 ◽

Cited By ~ 14

Author(s):

Nicolas Philippe ◽

Elias Bou Samra ◽

Anthony Boureux ◽

Alban Mancheron ◽

Florence Rufflé ◽

...

Keyword(s):

Human Genome ◽

Rna Sequencing ◽

Dynamic Range ◽

Tiling Array ◽

Expression Data ◽

Rna Seq ◽

Sequencing Data ◽

Data Set ◽

Protein Coding ◽

Protein Coding Genes

Abstract Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as ‘TranscriRef’). We then annotated 750 000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ∼34 000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at http://cractools.gforge.inria.fr/softwares/digitagct.

Download Full-text