scholarly journals Annotations capturing cell-type-specific TF binding explain a large fraction of disease heritability

2018 ◽  
Author(s):  
Bryce van de Geijn ◽  
Hilary Finucane ◽  
Steven Gazal ◽  
Farhad Hormozdiari ◽  
Tiffany Amariuta ◽  
...  

AbstractIt is widely known that regulatory variation plays a major role in complex disease and that cell-type-specific binding of transcription factors (TF) is critical to gene regulation, but genomic annotations from directly measured TF binding information are not currently available for most cell-type-TF pairs. Here, we construct cell-type-specific TF binding annotations by intersecting sequence-based TF binding predictions with cell-type-specific chromatin data; this strategy addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context, and the limitation that sequence-based predictions are generally not cell-type-specific. We evaluated different combinations of sequence-based TF predictions and chromatin data by partitioning the heritability of 49 diseases and complex traits (average N=320K) using stratified LD score regression with the baseline-LD model (which is not cell-type-specific). We determined that 100bp windows around MotifMap sequenced-based TF binding predictions intersected with a union of six cell-type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6x vs 7.3x; P = 9 × 10-14 for difference) and a 12% increase in cell-type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10-11 for difference). Our results show that intersecting sequence-based TF predictions with cell-type-specific chromatin information can help refine genome-wide association signals.

2019 ◽  
Vol 29 (7) ◽  
pp. 1057-1067 ◽  
Author(s):  
Bryce van de Geijn ◽  
Hilary Finucane ◽  
Steven Gazal ◽  
Farhad Hormozdiari ◽  
Tiffany Amariuta ◽  
...  

Abstract Regulatory variation plays a major role in complex disease and that cell type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF-binding sites to disease heritability is challenging, as binding is often cell type-specific and annotations from directly measured TF binding are not currently available for most cell type-TF pairs. We investigate approaches to annotate TF binding, including directly measured chromatin data and sequence-based predictions. We find that TF-binding annotations constructed by intersecting sequence-based TF-binding predictions with cell type-specific chromatin data explain a large fraction of heritability across a broad set of diseases and corresponding cell types; this strategy of constructing annotations addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context and the limitation that sequence-based predictions are generally not cell type-specific. We partitioned the heritability of 49 diseases and complex traits using stratified linkage disequilibrium (LD) score regression with the baseline-LD model (which is not cell type-specific) plus the new annotations. We determined that 100 bp windows around MotifMap sequenced-based TF-binding predictions intersected with a union of six cell type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6× vs. 7.3×, P = 9 × 10−14 for difference) and a 20% increase in cell type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10−11 for difference). Our results show that TF-binding annotations explain substantial disease heritability and can help refine genome-wide association signals.


2021 ◽  
Author(s):  
Rujin Wang ◽  
Danyu Lin ◽  
Yuchao Jiang

More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific omics measurements from single-cell sequencing. We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant tissues or cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We extend our framework to single-cell transcriptomic data and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and single-cell datasets and further validated using PubMed search and existing bulk case-control testing results.


2011 ◽  
Vol 22 (1) ◽  
pp. 9-24 ◽  
Author(s):  
B.-K. Lee ◽  
A. A. Bhinge ◽  
A. Battenhouse ◽  
R. M. McDaniell ◽  
Z. Liu ◽  
...  

2019 ◽  
Author(s):  
K.A.B. Gawronski ◽  
W. Bone ◽  
Y. Park ◽  
E. Pashos ◽  
X. Wang ◽  
...  

AbstractBackgroundGenome-wide association studies have identified 150+ loci associated with lipid levels. However, the genetic mechanisms underlying most of these loci are not well-understood. Recent work indicates that changes in the abundance of alternatively spliced transcripts contributes to complex trait variation. Consequently, identifying genetic loci that associate with alternative splicing in disease-relevant cell types and determining the degree to which these loci are informative for lipid biology is of broad interest.Methods and ResultsWe analyze gene splicing in 83 sample-matched induced pluripotent stem cell (iPSC) and hepatocyte-like cell (HLC) lines (n=166), as well as in an independent collection of primary liver tissues (n=96). We observe that transcript splicing is highly cell-type specific, and the genes that are differentially spliced between iPSCs and HLCs are enriched for metabolism pathway annotations. We identify 1,381 HLC splicing quantitative trait loci (sQTLs) and 1,462 iPSC sQTLs and find that sQTLs are often shared across cell types. To evaluate the contribution of sQTLs to variation in lipid levels, we conduct colocalization analysis using lipid genome-wide association data. We identify 19 lipid-associated loci that colocalize either with an HLC expression quantitative trait locus (eQTL) or sQTL. Only one locus colocalizes with both an sQTL and eQTL, indicating that sQTLs contribute information about GWAS loci that cannot be obtained by analysis of steady-state gene expression alone.ConclusionsThese results provide an important foundation for future efforts that use iPSC and iPSC-derived cells to evaluate genetic mechanisms influencing both cardiovascular disease risk and complex traits in general.


Blood ◽  
2021 ◽  
Author(s):  
Bon Q Trinh ◽  
Simone Ummarino ◽  
Yanzhou Zhang ◽  
Alexander K Ebralidze ◽  
Mahmoud A Bassal ◽  
...  

The mechanism underlying cell type-specific gene induction conferred by ubiquitous transcription factors as well as disruptions caused by their chimeric derivatives in leukemia is not well understood. Here we investigate whether RNAs coordinate with transcription factors to drive myeloid gene transcription. In an integrated genome-wide approach surveying for gene loci exhibiting concurrent RNA- and DNA-interactions with the broadly expressed transcription factor RUNX1, we identified the long noncoding RNA LOUP. This myeloid-specific and polyadenylated lncRNA induces myeloid differentiation and inhibits cell growth, acting as a transcriptional inducer of the myeloid master regulator PU.1. Mechanistically, LOUP recruits RUNX1 to both the PU.1 enhancer and the promoter, leading to the formation of an active chromatin loop. In t(8;21) acute myeloid leukemia, wherein RUNX1 is fused to ETO, the resulting oncogenic fusion protein RUNX1-ETO limits chromatin accessibility at the LOUP locus, causing inhibition of LOUP and PU.1 expression. These findings highlight the important role of the interplay between cell type-specific RNAs and transcription factors as well as their oncogenic derivatives in modulating lineage-gene activation and raise the possibility that RNA regulators of transcription factors represent alternative targets for therapeutic development.


2021 ◽  
Author(s):  
John M Rouhana ◽  
Jiali Wang ◽  
Gokcen Eraslan ◽  
Shankara Anand ◽  
Andrew R Hamel ◽  
...  

Summary: ECLIPSER was developed to identify pathogenic cell types and cell type-specific genes that may affect complex disease susceptibility and trait variation by integrating single cell data with known GWAS loci. ECLIPSER maps genes to GWAS loci for a given complex trait based on expression and splicing quantitative trait loci (e/sQTLs) and other functional data, and tests whether the mapped genes are enriched for cell type-specific expression in particular cell types using single-cell/nucleus RNA-seq data from one or more tissues of interest. A Bayesian Fisher's exact test is used to compute fold-enrichment significance. We demonstrate the application of ECLIPSER on various skin diseases and traits using snRNA-seq of healthy human skin samples. Availability and Implementation: The python source code and documentation for ECLIPSER and a Jupyter notebook for generating output tables and figures are available at https://github.com/segrelabgenomics/ECLIPSER. The source code for GWASvar2gene that maps genes to GWAS loci based on e/sQTLs is available at https://github.com/segrelabgenomics/GWASvar2gene. The analysis presented here used data from GTEx (https://gtexportal.org/home/datasets) and Open Targets Genetics (https://genetics-docs.opentargets.org/data-access/graphql-api), but can also be applied to other GWAS variant lists and QTL studies. Data used to reproduce the results of the paper are available in Supplementary data.


2021 ◽  
pp. gr.275723.121
Author(s):  
Jill E Moore ◽  
Xiao-Ou Zhang ◽  
Shaimae I Elhajjajy ◽  
Kaili Fan ◽  
Henry E Pratt ◽  
...  

Accurate transcription start site (TSS) annotations are essential for understanding transcriptional regulation and its role in human disease. Gene collections such as GENCODE contain annotations for tens of thousands of TSSs, but not all of these annotations are experimentally validated, nor do they contain information on cell type-specific usage. Therefore, we sought to generate a collection of experimentally validated TSSs by integrating RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression (RAMPAGE) data from 115 cell and tissue types, which resulted in a collection of approximately 50 thousand representative RAMPAGE peaks. These peaks were primarily proximal to GENCODE-annotated TSSs and were concordant with other transcription assays. Because RAMPAGE uses paired-end reads, we were then able to connect peaks to transcripts by analyzing the genomic positions of the 3' ends of read mates. Using this paired-end information, we classified the vast majority (37 thousand) of our RAMPAGE peaks as verified TSSs, updating TSS annotations for 20% of GENCODE genes. We also found that these updated TSS annotations were supported by epigenomic and other transcriptomic datasets. To demonstrate the utility of this RAMPAGE rPeak collection, we intersected it with the NHGRI/EBI genome-wide association studies (GWAS) catalog and identified new candidate GWAS genes. Overall, our work demonstrates the importance of integrating experimental data to further refine TSS annotations and provides a valuable resource for the biological community.


2020 ◽  
Vol 21 (17) ◽  
pp. 6141
Author(s):  
Keunsoo Kang ◽  
Yoonjung Choi ◽  
Hoo Hyun Kim ◽  
Kyung Hyun Yoo ◽  
Sungryul Yu

Forkhead box protein M1 (FOXM1) is a key transcription factor (TF) that regulates a common set of genes related to the cell cycle in various cell types. However, the mechanism by which FOXM1 controls the common gene set in different cellular contexts is unclear. In this study, a comprehensive meta-analysis of genome-wide FOXM1 binding sites in ECC-1, GM12878, K562, MCF-7, and SK-N-SH cell lines was conducted to predict FOXM1-driven gene regulation. Consistent with previous studies, different TF binding motifs were identified at FOXM1 binding sites, while the NFY binding motif was found at 81% of common FOXM1 binding sites in promoters of cell cycle-related genes. The results indicated that FOXM1 might control the gene set through interaction with the NFY proteins, while cell type-specific genes were predicted to be regulated by enhancers with FOXM1 and cell type-specific TFs. We also found that the high expression level of FOXM1 was significantly associated with poor prognosis in nine types of cancer. Overall, these results suggest that FOXM1 is predicted to function as a master regulator of the cell cycle through the interaction of NFY-family proteins, and therefore the inhibition of FOXM1 could be an attractive strategy for cancer therapy.


Sign in / Sign up

Export Citation Format

Share Document