scholarly journals Deep learning with implicit handling of tissue-specific phenomena predicts tumor DNA accessibility and immune activity

2017 ◽  
Author(s):  
Kamil Wnuk ◽  
Jeremi Sudol ◽  
Kevin B. Givechian ◽  
Patrick Soon-Shiong ◽  
Shahrooz Rabizadeh ◽  
...  

AbstractDNA accessibility is a key dynamic feature of chromatin regulation that can potentiate transcriptional events and tumor progression. Recently, neural networks have begun to make it possible to explore the impact of mutations on DNA accessibility and transcriptional regulation by demonstrating state-of-the-art prediction of chromatin features from DNA sequence data in specific tissue types. We demonstrate enhancements to improve such tissue-specific prediction performance, and show that by extending models with RNA-seq expression input, they can be applied to novel tissue samples whose types were not present in training. We show that our expression-informed model achieved particularly consistent accuracy predicting DNA accessibility at promoter and promoter flank regions of the genome.Leveraging this new tool to analyze tumor genomes across tissues, we provide a first glimpse of the DNA accessibility landscape across The Cancer Genome Atlas (TCGA). Our analysis of the Lung Adenocarcinoma (LUAD) cohort reveals that viewing tumors from the perspective of accessibility at promoters uniquely highlights several immune pathways inversely correlated with an overall more open chromatin state. Further, through identification of accessibility sites linked with differential gene expression in immune-inflamed LUAD tumors and training of a classifier ensemble, we show that patterns of predicted chromatin state are discriminative of immune activity across many tumor types, with direct implications for patient prognosis. We see such models playing a significant future role in matching patients to appropriate immunotherapy treatment regimens, as well as in analysis of other conditions where epigenetic state may play a significant role.Significance StatementDNA accessibility determines whether proteins have access to DNA-binding sites and is a key dynamic feature that influences regulation of gene expression that differentiates cells. We improve and extend a neural network model in a way that expands its application domain beyond studying the impact of genetic sequence and mutations on DNA accessibility in specific cell types, to tissues for which training data is unavailable.Leveraging our tool to analyze tumor genomes, we demonstrate that in lung adenocarcinomas the accessibility perspective uniquely highlights immune pathways inversely correlated with a more accessible DNA state. Further, we show that accessibility patterns learned from even a single tumor type can discriminate immune inflammation across many cancers, often with direct relation to patient prognosis.

2012 ◽  
Vol 22 (6) ◽  
pp. 1069-1080 ◽  
Author(s):  
Justin Cotney ◽  
Jing Leng ◽  
Sunghee Oh ◽  
Laura E. DeMare ◽  
Steven K. Reilly ◽  
...  

2021 ◽  
Author(s):  
Hao Lu ◽  
Luyu Ma ◽  
Lei Li ◽  
Cheng Quan ◽  
Yiming Lu ◽  
...  

Noncoding genomic variants constitute the majority of trait-associated genome variations; however, identification of functional noncoding variants is still a challenge in human genetics, and a method systematically assessing the impact of regulatory variants on gene expression and linking them to potential target genes is still lacking. Here we introduce a deep neural network (DNN)-based computational framework, RegVar, that can accurately predict the tissue-specific impact of noncoding regulatory variants on target genes. We show that, by robustly learning the genomic characteristics of massive variant-gene expression associations in a variety of human tissues, RegVar vastly surpasses all current noncoding variants prioritization methods in predicting regulatory variants under different circumstances. The unique features of RegVar make it an excellent framework for assessing the regulatory impact of any variant on its putative target genes in a variety of tissues. RegVar is available as a webserver at http://regvar.cbportal.org/.


2021 ◽  
Author(s):  
Yifeng Tao ◽  
Xiaojun Ma ◽  
Georgios I. Laliotis ◽  
Adler Guerrero Zuniga ◽  
Drake Palmer ◽  
...  

AbstractCancer is a disease of gene dysregulation, where cells acquire somatic and epigenetic alterations that drive aberrant cellular signaling. These alterations adversely impact transcriptional programs and cause profound changes in gene expression. Ultimately, interpreting patient somatic alterations within context-specific regulatory programs will facilitate personalized therapeutic decisions for each individual. Towards this goal, we develop a partially interpretable neural network model with encoder-decoder architecture, called Chromatin-informed Inference of Transcriptional Regulators Using Self-attention mechanism (CITRUS), to model the impact of somatic alterations on cellular states and further onto downstream gene expression programs. The encoder module employs a self-attention mechanism to model the contextual impact of somatic alterations in a tumor-specific manner. Furthermore, the model uses a layer of hidden nodes to explicitly represent the state of transcription factors (TFs), and the decoder learns the relationships between TFs and their target genes guided by the sparse prior based on TF binding motifs in the open chromatin regions of tumor samples. We apply CITRUS to genomic, mRNA sequencing and ATAC-seq data from tumors of 17 cancer types profiled by The Cancer Genome Atlas. Our computational framework enables us to share information across tumors to learn patient-specific TF activities, revealing regulatory program similarities and differences between and within tumor types. We show that CITRUS not only outperforms the competing models in predicting RNA expression, but also yields biological insights in delineating TFs associated with somatic alterations in individual tumors. We also validate the differential activity of TFs associated with mutant PIK3CA in breast cancer cell line and xenograft models using a panel of PI3K pathway inhibitors.


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 3381-3381
Author(s):  
Marco De Gobbi ◽  
Jim R. Hughes ◽  
Karen M. Lower ◽  
Chris J. Derry ◽  
Emanuele Marchi ◽  
...  

Abstract Abstract 3381 It is well established that the level of gene expression can vary significantly between normal individuals, and that the majority of this variation is due to naturally occurring genomic variability caused by single nucleotide polymorphisms (SNPs). Therefore, identifying functional cis-regulatory polymorphisms and understanding how they influence gene expression is an important new task in many areas of medical research, including molecular hematology. We have previously shown that an entirely new form of alpha-thalassemia is caused by a gain of function regulatory SNP in an unremarkable non-coding region in the alpha-globin cluster. This SNP creates a novel, functional GATA site, which recruits a tissue-specific transcription factor (TF) complex. This creates a new promoter-like element, which interferes with activation of the globin genes (De Gobbi et al. Science 2006,312:1215–1257). Here, to investigate the extent and the impact of this class of regulatory SNP, using ChIP-Seq we characterized differences in the occupancy of Scl/TAL-1 (a tissue-specific TF critical for erythroid maturation) in the erythroblasts of two individuals from the same ethnic background (Caucasian 1, C1, and Caucasian 2, C2). Sequence reads from two biological replicates of each individual were merged and aligned to the human reference genome (NCBI36/hg18) and a total of 2936 Scl/TAL-1 bound regions were identified. Using two de novo motif finding algorithms (MEME and DREME), we identified GATA (WGATAR) and E-box (CAGMTG) sites as the preferred sequences associated with in vivo binding of Scl/TAL-1. In addition, other motifs were enriched at the Scl/TAL-1 targets; among these were binding sites for known TFs (Sp1/Klf, RUNX1 and NFE2). To identify differentially bound regions between C1 and C2, a two-class paired-test, Rank Product analysis (500 permutations, FDR<0.2) was performed with MeV4.6 TM4 Software. About 1% (25/2936) of these sites showed differential binding. Differences were mostly associated with SNPs directly affecting or lying adjacent to known TF consensus binding sites and deviations from the GATA or E-box consensus motifs corresponded to the inability of the sequence to bind Scl/TAL-1. Since it has been previously shown that the function of active transcriptional elements can be predicted on the basis of chromatin signatures (e.g. enhancers marked by H3K4me1 and promoters marked by H3K4me3), to further characterize the Scl/TAL-1 differentially occupied sites, we asked which chromatin signatures are associated with these regions. H3K4me1 and H3K4me3 ChIP-Seq experiments, together with analyses of publicly available data sets, showed that the most of the SNPs responsible for variation in the recruitment of Scl/TAL-1 (23/25) lie in DNA sequences that have chromatin signatures predictive of enhancer elements, suggesting a potential long-range function in modulating gene expression. Finally, Scl/TAL-1 ChIP-Seq analysis of erythroblasts of a third individual from a different ethnic background (African-Caribbean, A3) revealed more distinctive targets including a well known regulatory SNP at the promoter of the DARC gene (encoding the Duffy blood group), which alters a GATA binding motif in A3, conferring the malaria-resistant Duffy-null phenotype. Given the exponential growth in genome-wide association studies by which numerous SNPs are being either associated with hematological parameters, or implicated in the etiology of hematologic disorders, this study elucidates molecular mechanisms which might account for phenotypic diversity and highlights the importance of carrying out functional characterization of non-coding polymorphisms found to be associated with disease risk. Disclosures: No relevant conflicts of interest to declare.


Blood ◽  
2014 ◽  
Vol 124 (21) ◽  
pp. 2186-2186
Author(s):  
Barbara Spitzer ◽  
Olga A Guryanova ◽  
Omar Abdel-Wahab ◽  
Nicole Kucine ◽  
Mazhar Adli ◽  
...  

Abstract Molecular studies have shown that specific somatic mutations impact therapeutic response and overall outcome in acute myeloid leukemia (AML) and have informed the development of molecularly targeted therapies. Previous studies have shown that the FLT3-ITD mutant disease allele predicts a poor prognosis in AML. Despite this important insight and the established role of FLT3-ITD mutations in AML pathogenesis, the impact of this mutation on gene regulation has not been extensively investigated. We hypothesized that transcriptional and epigenetic studies using genetically accurate murine models, cell lines, and primary AML samples would allow us to identify how FLT3 activation induces changes in gene expression and chromatin state. To assess the impact of FLT3-ITD associated FLT3 activation on gene expression, we performed RNA-sequencing studies on two FLT3-ITD cell lines (MOLM-13 and MV4-11) in the presence/absence of AC-220, a potent, specific FLT3 inhibitor. We also treated mice expressing a constitutive FLT3-ITD knock-in allele with AC-220 or vehicle, and performed RNA-sequencing on purified granulocyte-macrophage progenitors (GMPs). We assessed the impact of transient (4-12 hours drug treatment) and chronic (10-14 days) FLT3 inhibition on gene expression; we hypothesized that chronic drug exposure would result in more robust FLT3-mutant gene expression changes. In each case, the effects of FLT3-ITD activation/inhibition on gene expression were compared to RNA-seq data from FLT3-ITD mutant patients from TCGA. We first investigated the impact of short-term and chronic drug exposure on FLT3-ITD dependent gene expression in vitro. Comparison of short-term drug and vehicle treated cells revealed 159 differentially expressed (DE) genes (Benjamini-Hochberg false discovery rate (BH FDR) p < 0.05 and an absolute log2 fold change (FC) > 0.8). By contrast, we found that chronic FLT3 inhibition identified 743 DE genes. Comparison between the acutely and chronically treated cell lines revealed overlap of only 19 genes, suggesting important differences between the acute and steady-state effects of FLT3-inhibition. We found more significant overlap between chronic FLT3-inhibitor gene expression and FLT3-ITD specific gene expression in TCGA, demonstrating that long-term drug exposure more robustly delineates mutant-specific effects on gene expression. We next investigated the impact of short and long term FLT3-inhibition on gene expression in vivo. Analysis of DE genes identified 93 genes in the acutely treated mice vs. vehicle, and 274 genes in chronically treated mice (BH FDR p < 0.05 and absolute log2 FC of > 0.5). Only 12 DE genes were shared between these two perturbations compared with vehicle control. We then compared these gene expression signatures to FLT3-ITD specific gene expression from TCGA; we noted a significant inverse correlation between the signatures of chronic FLT3 inhibition in vivo with FLT3-ITD specific gene expression in TCGA (R2=0.47), but no correlation between the gene expression changes of acute FLT3 inhibition and FLT3-ITD TCGA patients (R2=0.09). We next integrated the FLT3 signatures from our in vivo work and TCGA with ChIP-sequencing for H3K4me3 and H3K27me3 in primary samples with FLT3-ITD compared to normal controls. We found that 3.6% of DE genes in our in vivo system, and 7.2% of DE genes in TCGA, had significant changes in H3K4me3 or H3K27me3. The most common alteration in chromatin state observed with FLT3 activation was an increase in H3K4me3 and transcriptional activation, with a smaller set of genes showing increased H3K27me3 and reduced expression, consistent with FLT3-mediated epigenetic repression. Motif analysis showed that DE loci with significant changes in chromatin state were enriched for ELF5, NF-E2, Pu.1, and Bach1 binding sequences, implicating these transcription factors in mediating FLT3-dependent gene expression effects. By studying the global transcriptional changes that occur with chronic, steady-state FLT3 inhibition in in vitro and in vivo systems, we identified a set of gene expression changes characteristic of FLT3-activation. In addition, integrating changes in gene expression and chromatin state allowed us to identify loci with alterations in epigenetic state in the setting of FLT3-ITD associated FLT3 activation, and to identify candidate transcription factors that mediate FLT3-dependent effects on gene expression. Disclosures No relevant conflicts of interest to declare.


2019 ◽  
Author(s):  
Pengfei Dong ◽  
Xiaoyu Tu ◽  
Haoxuan Li ◽  
Jianhua Zhang ◽  
Donald Grierson ◽  
...  

AbstractChromatins are not randomly packaged in the nucleus and their organization plays important roles in transcription regulation. Usingin situHi-C, we have compared the 3D chromatin architectures of rice mesophyll and endosperm, foxtail millet bundle sheath and mesophyll, and maize bundle sheath, mesophyll and endosperm tissues. We have also profiled their DNA methylation, open chromatin, histone modification and gene expression to investigate whether chromatin structural dynamics are associated with epigenome features changes. We found that plant global A/B compartment partitions are stable across tissues, while local A/B compartment has tissue-specific dynamic that is associated with differential gene expression. Plant domains are largely stable across tissues, while rare domain border changes are often associated with gene activation. Genes inside plant domains are not conserved across species, and lack significant co-expression behavior unlike those in mammalian cells. When comparing synteny gene pairs, we found those maize genes involved in gene island chromatin loops have shorter genomic distances in smaller genomes without gene island loops such as rice and foxtail millet, suggesting that they have conserved functions. Our study revealed that the 3D configuration of the plant chromatin is also complex and dynamic with unique features that need to be further examined.


2020 ◽  
Author(s):  
Mitra Ansariola ◽  
Valerie N. Fraser ◽  
Sergei A. Filichkin ◽  
Maria G. Ivanchenko ◽  
Zachary A. Bright ◽  
...  

AbstractAcross tissues, gene expression is regulated by a combination of determinants, including the binding of transcription factors (TFs), along with other aspects of cellular state. Recent studies emphasize the importance of both genetic and epigenetic states – TF binding sites and binding site chromatin accessibility have emerged as potentially causal determinants of tissue specificity. To investigate the relative contributions of these determinants, we constructed three genome-scale datasets for both root and shoot tissues of the same Arabidopsis thaliana plants: TSS-seq data to identify Transcription Start Sites, OC-seq data to identify regions of Open Chromatin, and RNA-seq data to assess gene expression levels. For genes that are differentially expressed between root and shoot, we constructed a machine learning model predicting tissue of expression from chromatin accessibility and TF binding information upstream of TSS locations. The resulting model was highly accurate (over 90% auROC and auPRC), and our analysis of model contributions (feature weights) strongly suggests that patterns of TF binding sites within ∼500 nt TSS-proximal regions are predominant explainers of tissue of expression in most cases. Thus, in plants, cis-regulatory control of tissue-specific gene expression appears to be primarily determined by TSS-proximal sequences, and rarely by distal enhancer-like accessible chromatin regions. This study highlights the exciting future possibility of a native TF site-based design process for the tissue-specific targeting of plant gene promoters.


2021 ◽  
Author(s):  
Alex RJ Lima ◽  
Saloe B Poubel ◽  
Juliana N Roson ◽  
Loyze PO de Lima ◽  
Hellida M Costa-Silva ◽  
...  

Background: Genomic organization and gene expression regulation in trypanosomes are remarkable because protein-coding genes are organized into codirectional gene clusters with unrelated functions. Moreover, there is no dedicated promoter for each gene, resulting in polycistronic gene transcription, with posttranscriptional control playing a major role. Nonetheless, these parasites harbor epigenetic modifications at critical regulatory genome features that dynamically change among parasite stages, which are not fully understood. Results: Here, we investigated the impact of chromatin changes in a scenario commanded by posttranscriptional control exploring the parasite Trypanosoma cruzi and its differentiation program using genome-wide approaches supported by transmission electron microscopy. The integration of FAIRE and MNase-seq data, two complementary epigenomic approaches, enabled us to identify differences in T. cruzi genome compartments, putative transcriptional start regions and virulence factors. In addition, we also detected developmental chromatin regulation at tRNA loci (tDNA), which seems to be linked to the translation regulatory mechanism required for parasite differentiation. Strikingly, a positive correlation was observed between active chromatin and steady-state transcription levels. Conclusion: Taken together, our results indicate that chromatin changes reflect the unusual gene expression regulation of trypanosomes and the differences among parasite developmental stages, even in the context of a lack of canonical transcriptional control of protein-coding genes.


2019 ◽  
Author(s):  
Kamil Wnuk ◽  
Jeremi Sudol ◽  
Kevin Givechian ◽  
Patrick Soon-Shiong ◽  
Shahrooz Rabizadeh ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document