scholarly journals Fine-mapping cellular QTLs with RASQUAL and ATAC-seq

2015 ◽  
Author(s):  
Natsuhiko Kumasaka ◽  
Andrew Knights ◽  
Daniel Gaffney

When cellular traits are measured using high-throughput DNA sequencing quantitative trait loci (QTLs) manifest at two levels: population level differences between individuals and allelic differences between cis-haplotypes within individuals. We present RASQUAL (Robust Allele Specific QUAntitation and quality controL), a novel statistical approach for association mapping that integrates genetic effects and robust modelling of biases in next generation sequencing (NGS) data within a single, probabilistic framework. RASQUAL substantially improves causal variant localisation and sensitivity of association detection over existing methods in RNA-seq, DNaseI-seq and ChIP-seq data. We illustrate how RASQUAL can be used to maximise association detection by generating the first map of chromatin accessibility QTLs (caQTLs) in a European population using ATAC-seq. Despite a modest sample size, we identified 2,706 independent caQTLs (FDR 10%) and illustrate how RASQUAL's improved causal variant localisation provides powerful information for fine-mapping disease-associated variants. We also map “multipeak” caQTLs, identical genetic associations found across multiple, independent open chromatin regions and illustrate how genetic signals in ATAC-seq data can be used to link distal regulatory elements with gene promoters. Our results highlight how joint modelling of population and allele-specific genetic signals can improve functional interpretation of noncoding variation.

2021 ◽  
Author(s):  
Carla J Cohen ◽  
Connor Davidson ◽  
Carlo Selmi ◽  
Paul Bowness ◽  
Julian C Knight ◽  
...  

ABSTRACTBackgroundAnkylosing Spondylitis (AS) is a common form of inflammatory spinal arthritis with a complex aetiology and high heritability, involving more than 100 genetic associations. These include several AS-associated single nucleotide polymorphisms (SNPs) upstream of RUNX3, which encodes the multifunctional RUNT-related transcription factor (TF) 3. The lead associated SNP rs6600247 (p= 2.6 x 10-15) lies ~13kb upstream of the RUNX3 promoter adjacent to a c-MYC TF binding-site. The effect of rs6600247 genotype on DNA binding and chromosome looping were investigated by electrophoretic mobility gel shift assays (EMSA), Western blotting-EMSA (WEMSA) and Chromosome Conformation Capture (3C).ResultsInterrogation of ENCODE published data showed open chromatin in the region overlapping rs6600247 in primary human CD14+ monocytes in contrast to Jurkat T cell line or primary T-cells. The rs6600247 AS-risk allele is predicted to specifically disrupt a c-MYC binding-site. Using a 50bp DNA probe spanning rs6600247 there was consistently less binding to the AS-risk “C” allele of both purified c-MYC protein and nuclear extracts (NE) from monocyte-like U937 cells. WEMSA on U937 NE and purified c-MYC protein confirmed these differences (n=2; p<0.05). 3C experiments demonstrated negligible interaction between the region encompassing rs6600247 and the RUNX3 promoter. A stronger interaction frequency was demonstrated between the RUNX3 promoter and the previously characterised AS-associated SNP rs4648889.ConclusionsThe lead SNP rs6600247, located in an enhancer-like region upstream of the RUNX3 promoter, modulates c-MYC binding. However, the region encompassing rs6600247 has rather limited physical interaction with the promoter of RUNX3. In contrast a clear chromatin looping event between the region encompassing rs4648889 and the RUNX3 promoter was observed. These data provide further evidence for complexity in the regulatory elements upstream of the RUNX3 promoter and the involvement of RUNX3 transcriptional regulation in AS.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Yanyu Liang ◽  
François Aguet ◽  
Alvaro N. Barbeira ◽  
Kristin Ardlie ◽  
Hae Kyung Im

AbstractGenetic studies of the transcriptome help bridge the gap between genetic variation and phenotypes. To maximize the potential of such studies, efficient methods to identify expression quantitative trait loci (eQTLs) and perform fine-mapping and genetic prediction of gene expression traits are needed. Current methods that leverage both total read counts and allele-specific expression to identify eQTLs are generally computationally intractable for large transcriptomic studies. Here, we describe a unified framework that addresses these needs and is scalable to thousands of samples. Using simulations and data from GTEx, we demonstrate its calibration and performance. For example, mixQTL shows a power gain equivalent to a 29% increase in sample size for genes with sufficient allele-specific read coverage. To showcase the potential of mixQTL, we apply it to 49 GTEx tissues and find 20% additional eQTLs (FDR < 0.05, per tissue) that are significantly more enriched among trait associated variants and candidate cis-regulatory elements comparing to the standard approach.


2020 ◽  
Author(s):  
Masami Ando-Kuri ◽  
Rodrigo G. Arzate-Mejía ◽  
Jorg Morf ◽  
Jonathan Cairns ◽  
Cesar A. Poot-Hernández ◽  
...  

SummaryCircadian gene expression is essential for organisms to adjust cellular responses and anticipate daily changes in the environment. In addition to its physiological importance, the clock circuit represents an ideal, temporally resolved, system to study transcription regulation. Here, we analysed changes in spatial mouse liver chromatin conformation using genome-wide and promoter-capture Hi-C alongside daily oscillations in gene transcription in mouse liver. We found circadian topologically associated domains switched assignments to the transcriptionally active, open chromatin compartment and the inactive compartment at different hours of the day while their boundaries stably maintain their structure over time. Individual circadian gene promoters displayed maximal chromatin contacts at times of peak transcriptional output and the expression of circadian genes and contacted transcribed regulatory elements, or other circadian genes, was phase-coherent. Anchor sites of promoter chromatin loops were enriched in binding sites for liver nuclear receptors and transcription factors, some exclusively present in either rhythmic or stable contacts. The circadian 3D chromatin maps provided here identify the scales of chromatin conformation that parallel oscillatory gene expression and protein factors specifically associated with circadian or stable chromatin configurations.


2019 ◽  
Author(s):  
Austin T. Wang ◽  
Anamay Shetty ◽  
Edward O’Connor ◽  
Connor Bell ◽  
Mark M. Pomerantz ◽  
...  

AbstractAlthough quantitative trait locus (QTL) associations have been identified for many molecular traits such as gene expression, it remains challenging to distinguish the causal nucleotide from nearby variants. In addition to traditional QTLs by association, allele-specific (AS) QTLs are a powerful measure of cis-regulation that are largely concordant with traditional QTLs, and can be less susceptible to technical/environmental noise. However, existing asQTL analysis methods do not produce probabilities of causality for each marker, and do not take into account correlations among markers at a locus in linkage disequilibrium (LD). We introduce PLASMA (PopuLation Allele-Specific MApping), a novel, LD-aware method that integrates QTL and asQTL information to fine-map causal regulatory variants while drawing power from both the number of individuals and the number of allelic reads per individual. We demonstrate through simulations that PLASMA successfully detects causal variants over a wide range of genetic architectures. We apply PLASMA to RNA-Seq data from 524 kidney tumor samples and show that over 17 percent of loci can be fine-mapped to within 5 causal variants, compared less than 2 percent of loci using existing QTL-based fine-mapping. PLASMA furthermore achieves a greater power at 50 samples than conventional QTL fine-mapping does at over 500 samples. Overall, PLASMA achieves a 6.9-fold reduction in median 95% credible set size compared to existing QTL-based fine-mapping. We additionally apply PLASMA to H3K27AC ChIP-Seq from 28 prostate tumor/normal samples and demonstrate that PLASMA is able to prioritize markers even at small samples, with PLASMA achieving a 1.3-fold reduction in median 95% credible set sizes over existing QTL-based fine-mapping. Variants in the PLASMA credible sets for RNA-Seq and ChIP-Seq were enriched for open chromatin and chromatin looping (respectively) at a comparable or greater degree than credible variants from existing methods, while containing far fewer markers. Our results demonstrate how integrating AS activity can substantially improve the detection of causal variants from existing molecular data and at low sample size.


Science ◽  
2020 ◽  
Vol 369 (6503) ◽  
pp. 561-565 ◽  
Author(s):  
Siwei Zhang ◽  
Hanwen Zhang ◽  
Yifan Zhou ◽  
Min Qiao ◽  
Siming Zhao ◽  
...  

Most neuropsychiatric disease risk variants are in noncoding sequences and lack functional interpretation. Because regulatory sequences often reside in open chromatin, we reasoned that neuropsychiatric disease risk variants may affect chromatin accessibility during neurodevelopment. Using human induced pluripotent stem cell (iPSC)–derived neurons that model developing brains, we identified thousands of genetic variants exhibiting allele-specific open chromatin (ASoC). These neuronal ASoCs were partially driven by altered transcription factor binding, overrepresented in brain gene enhancers and expression quantitative trait loci, and frequently associated with distal genes through chromatin contacts. ASoCs were enriched for genetic variants associated with brain disorders, enabling identification of functional schizophrenia risk variants and their cis-target genes. This study highlights ASoC as a functional mechanism of noncoding neuropsychiatric risk variants, providing a powerful framework for identifying disease causal variants and genes.


Genes ◽  
2019 ◽  
Vol 10 (6) ◽  
pp. 433 ◽  
Author(s):  
Kate Megquier ◽  
Diane P. Genereux ◽  
Jessica Hekman ◽  
Ross Swofford ◽  
Jason Turner-Maier ◽  
...  

Dogs are an unparalleled natural model for investigating the genetics of health and disease, particularly for complex diseases like cancer. Comprehensive genomic annotation of regulatory elements active in healthy canine tissues is crucial both for identifying candidate causal variants and for designing functional studies needed to translate genetic associations into disease insight. Currently, canine geneticists rely primarily on annotations of the human or mouse genome that have been remapped to dog, an approach that misses dog-specific features. Here, we describe BarkBase, a canine epigenomic resource available at barkbase.org. BarkBase hosts data for 27 adult tissue types, with biological replicates, and for one sample of up to five tissues sampled at each of four carefully staged embryonic time points. RNA sequencing is complemented with whole genome sequencing and with assay for transposase-accessible chromatin using sequencing (ATAC-seq), which identifies open chromatin regions. By including replicates, we can more confidently discern tissue-specific transcripts and assess differential gene expression between tissues and timepoints. By offering data in easy-to-use file formats, through a visual browser modeled on similar genomic resources for human, BarkBase introduces a powerful new resource to support comparative studies in dogs and humans.


2022 ◽  
Vol 12 ◽  
Author(s):  
Carla J. Cohen ◽  
Connor Davidson ◽  
Carlo Selmi ◽  
Paul Bowness ◽  
Julian C. Knight ◽  
...  

Background: Ankylosing Spondylitis (AS) is a common form of inflammatory spinal arthritis with a complex aetiology and high heritability, involving more than 100 genetic associations. These include several AS-associated single nucleotide polymorphisms (SNPs) upstream of RUNX3, which encodes the multifunctional RUNT-related transcription factor (TF) 3. The lead associated SNP rs6600247 (p = 2.6 × 10−15) lies ∼13kb upstream of the RUNX3 promoter adjacent to a c-MYC TF binding-site. The effect of rs6600247 genotype on DNA binding and chromosome looping were investigated by electrophoretic mobility gel shift assays (EMSA), Western blotting-EMSA (WEMSA) and Chromosome Conformation Capture (3C).Results: Interrogation of ENCODE published data showed open chromatin in the region overlapping rs6600247 in primary human CD14+ monocytes, in contrast to the Jurkat T cell line or primary human T-cells. The rs6600247 AS-risk allele is predicted to specifically disrupt a c-MYC binding-site. Using a 50bp DNA probe spanning rs6600247 we consistently observed reduced binding to the AS-risk “C” allele of both purified c-MYC protein and nuclear extracts (NE) from monocyte-like U937 cells. WEMSA on U937 NE and purified c-MYC protein confirmed these differences (n = 3; p &lt; 0.05). 3C experiments demonstrated negligible interaction between the region encompassing rs6600247 and the RUNX3 promoter. A stronger interaction frequency was demonstrated between the RUNX3 promoter and the previously characterised AS-associated SNP rs4648889.Conclusion: The lead SNP rs6600247, located in an enhancer-like region upstream of the RUNX3 promoter, modulates c-MYC binding. However, the region encompassing rs6600247 has rather limited physical interaction with the promoter of RUNX3. In contrast a clear chromatin looping event between the region encompassing rs4648889 and the RUNX3 promoter was observed. These data provide further evidence for complexity in the regulatory elements upstream of the RUNX3 promoter and the involvement of RUNX3 transcriptional regulation in AS.


2020 ◽  
Vol 29 (R1) ◽  
pp. R81-R88 ◽  
Author(s):  
Anna Hutchinson ◽  
Jennifer Asimit ◽  
Chris Wallace

Abstract Whilst thousands of genetic variants have been associated with human traits, identifying the subset of those variants that are causal requires a further ‘fine-mapping’ step. We review the basic fine-mapping approach, which is computationally fast and requires only summary data, but depends on an assumption of a single causal variant per associated region which is recognized as biologically unrealistic. We discuss different ways that the approach has been built upon to accommodate multiple causal variants in a region and to incorporate additional layers of functional annotation data. We further review methods for simultaneous fine-mapping of multiple datasets, either exploiting different linkage disequilibrium (LD) structures across ancestries or borrowing information between distinct but related traits. Finally, we look to the future and the opportunities that will be offered by increasingly accurate maps of causal variants for a multitude of human traits.


2020 ◽  
Author(s):  
Marat Sabirov ◽  
Olga Kyrchanova ◽  
Galina V. Pokholkova ◽  
Artem Bonchuk ◽  
Natalia Klimenko ◽  
...  

AbstractThe architectural protein Pita is critical for Drosophila embryogenesis and predominantly binds to gene promoters and insulators. In particular, Pita is involved in the organization of boundaries between regulatory domains that controlled the expression of three hox genes in the Bithorax complex (BX-C). The best-characterized partner for Pita is the BTB/POZ-domain containing protein CP190. Using in vitro pull-down analysis, we precisely mapped two unstructured regions of Pita that interact with the BTB domain of CP190. Then we constructed transgenic lines expressing the Pita protein of the wild-type and mutant variants lacking CP190-interacting regions. The expression of the mutant protein completely complemented the null pita mutation. ChIP-seq experiments with wild-type and mutant embryos showed that the deletion of the CP190-interacting regions did not significantly affect the binding of the mutant Pita protein to most chromatin sites. However, the mutant Pita protein does not support the ability of multimerized Pita sites to prevent cross-talk between the iab-6 and iab-7 regulatory domains that activate the expression of Abdominal-B (Abd-B), one of the genes in the BX-C. The recruitment of a chimeric protein consisting of the DNA-binding domain of GAL4 and CP190-interacting region of the Pita to the GAL4 binding sites on the polytene chromosomes of larvae induces the formation of a new interband, which is a consequence of the formation of open chromatin in this region. These results suggested that the interaction with CP190 is required for the primary Pita activities, but other architectural proteins may also recruit CP190 in flies expressing only the mutant Pita protein.Author SummaryPita is required for Drosophila development and binds specifically to a long motif in active promoters and insulators. Pita belongs to the Drosophila family of zinc-finger architectural proteins, which also includes Su(Hw) and the conserved among higher eukaryotes CTCF. The architectural proteins maintain the active state of regulatory elements and the long-distance interactions between them. The CP190 protein is recruited to chromatin through interaction with the architectural proteins. Here we mapped two regions in Pita that are required for interaction with the CP190 protein. We have demonstrated that CP190-interacting region of the Pita can maintain nucleosome-free open chromatin and is critical for Pita-mediated enhancer blocking activity. At the same time, interaction with CP190 is not required for the in vivo function of the mutant Pita protein, which binds to the same regions of the genome as the wild-type protein. Unexpectedly, we found that CP190 was still associated with the most of genome regions bound by the mutant Pita protein, which suggested that other architectural proteins were continuing to recruit CP190 to these regions. These results support a model in which the regulatory elements are composed of combinations of binding sites that interact with several architectural proteins with similar functions.


2019 ◽  
Author(s):  
Siwei Zhang ◽  
Hanwen Zhang ◽  
Min Qiao ◽  
Yifan Zhou ◽  
Siming Zhao ◽  
...  

AbstractFunctional interpretation of noncoding disease variants, which likely regulate gene expression, has been challenging. Chromatin accessibility strongly influences gene expression during neurodevelopment; however, to what extent genetic variants can alter chromatin accessibility in the context of brain disorders/traits is unknown. Using human induced pluripotent stem cell (iPSC)-derived neurons as a neurodevelopmental model, we identified abundant open-chromatin regions absent in adult brain samples and thousands of genetic variants exhibiting allele-specific open-chromatin (ASoC). ASoC variants are overrepresented in brain enhancers, transcription-factor-binding sites, and quantitative-trait-loci associated with gene expression, histone modification, and DNA methylation. Notably, compared to open chromatin regions and other commonly used functional annotations, neuronal ASoC variants showed much stronger enrichments of risk variants for various brain disorders/traits. Our study provides the first snapshot of the neuronal ASoC landscape and a powerful framework for prioritizing functional disease variants.One Sentence SummaryAllele-specific open chromatin informs functional disease variants


Sign in / Sign up

Export Citation Format

Share Document