scholarly journals Allele-Specific QTL Fine-Mapping with PLASMA

2019 ◽  
Author(s):  
Austin T. Wang ◽  
Anamay Shetty ◽  
Edward O’Connor ◽  
Connor Bell ◽  
Mark M. Pomerantz ◽  
...  

AbstractAlthough quantitative trait locus (QTL) associations have been identified for many molecular traits such as gene expression, it remains challenging to distinguish the causal nucleotide from nearby variants. In addition to traditional QTLs by association, allele-specific (AS) QTLs are a powerful measure of cis-regulation that are largely concordant with traditional QTLs, and can be less susceptible to technical/environmental noise. However, existing asQTL analysis methods do not produce probabilities of causality for each marker, and do not take into account correlations among markers at a locus in linkage disequilibrium (LD). We introduce PLASMA (PopuLation Allele-Specific MApping), a novel, LD-aware method that integrates QTL and asQTL information to fine-map causal regulatory variants while drawing power from both the number of individuals and the number of allelic reads per individual. We demonstrate through simulations that PLASMA successfully detects causal variants over a wide range of genetic architectures. We apply PLASMA to RNA-Seq data from 524 kidney tumor samples and show that over 17 percent of loci can be fine-mapped to within 5 causal variants, compared less than 2 percent of loci using existing QTL-based fine-mapping. PLASMA furthermore achieves a greater power at 50 samples than conventional QTL fine-mapping does at over 500 samples. Overall, PLASMA achieves a 6.9-fold reduction in median 95% credible set size compared to existing QTL-based fine-mapping. We additionally apply PLASMA to H3K27AC ChIP-Seq from 28 prostate tumor/normal samples and demonstrate that PLASMA is able to prioritize markers even at small samples, with PLASMA achieving a 1.3-fold reduction in median 95% credible set sizes over existing QTL-based fine-mapping. Variants in the PLASMA credible sets for RNA-Seq and ChIP-Seq were enriched for open chromatin and chromatin looping (respectively) at a comparable or greater degree than credible variants from existing methods, while containing far fewer markers. Our results demonstrate how integrating AS activity can substantially improve the detection of causal variants from existing molecular data and at low sample size.

2019 ◽  
Author(s):  
Jennifer L Asimit ◽  
Daniel B Rainbow ◽  
Mary D Fortune ◽  
Nastasiya F Grinberg ◽  
Linda S Wicker ◽  
...  

AbstractThousands of genetic variants have been associated with human disease risk, but linkage disequilibrium (LD) hinders fine-mapping the causal variants. We show that stepwise regression, and, to a lesser extent, stochastic search fine mapping can mis-identify as causal, SNPs which jointly tag distinct causal variants. Frequent sharing of causal variants between immune-mediated diseases (IMD) motivated us to develop a computationally efficient multinomial fine-mapping (MFM) approach that borrows information between diseases in a Bayesian framework. We show that MFM has greater accuracy than single disease analysis when shared causal variants exist, and negligible loss of precision otherwise. Applying MFM to data from six IMD revealed causal variants undetected in individual disease analysis, including in IL2RA where we confirm functional effects of multiple causal variants using allele-specific expression in sorted CD4+ T cells from genotype-selected individuals. MFM has the potential to increase fine-mapping resolution in related diseases enabling the identification of associated cellular and molecular phenotypes.


2015 ◽  
Author(s):  
Natsuhiko Kumasaka ◽  
Andrew Knights ◽  
Daniel Gaffney

When cellular traits are measured using high-throughput DNA sequencing quantitative trait loci (QTLs) manifest at two levels: population level differences between individuals and allelic differences between cis-haplotypes within individuals. We present RASQUAL (Robust Allele Specific QUAntitation and quality controL), a novel statistical approach for association mapping that integrates genetic effects and robust modelling of biases in next generation sequencing (NGS) data within a single, probabilistic framework. RASQUAL substantially improves causal variant localisation and sensitivity of association detection over existing methods in RNA-seq, DNaseI-seq and ChIP-seq data. We illustrate how RASQUAL can be used to maximise association detection by generating the first map of chromatin accessibility QTLs (caQTLs) in a European population using ATAC-seq. Despite a modest sample size, we identified 2,706 independent caQTLs (FDR 10%) and illustrate how RASQUAL's improved causal variant localisation provides powerful information for fine-mapping disease-associated variants. We also map “multipeak” caQTLs, identical genetic associations found across multiple, independent open chromatin regions and illustrate how genetic signals in ATAC-seq data can be used to link distal regulatory elements with gene promoters. Our results highlight how joint modelling of population and allele-specific genetic signals can improve functional interpretation of noncoding variation.


2016 ◽  
Author(s):  
Andrew Anand Brown ◽  
Ana Viñuela ◽  
Olivier Delaneau ◽  
Tim Spector ◽  
Kerrin Small ◽  
...  

Genetic association mapping produces statistical links between phenotypes and genomic regions, but identifying the causal variants themselves remains difficult. Complete knowledge of all genetic variants, as provided by whole genome sequence (WGS), will help, but is currently financially prohibitive for well powered GWAS studies. To explore the advantages of WGS in a well powered setting, we performed eQTL mapping using WGS and RNA-seq, and showed that the lead eQTL variants called using WGS are more likely to be causal. We derived properties of the causal variant from simulation studies, and used these to propose a method for implicating likely causal SNPs. This method predicts that 25% - 70% of the causal variants lie in open chromatin regions, depending on tissue and experiment. Finally, we identify a set of high confidence causal variants and show that they are more enriched in GWAS associations than other eQTL. Of these, we find 65 associations with GWAS traits and show examples where the gene implicated by expression has been functionally validated as relevant for complex traits.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Yanyu Liang ◽  
François Aguet ◽  
Alvaro N. Barbeira ◽  
Kristin Ardlie ◽  
Hae Kyung Im

AbstractGenetic studies of the transcriptome help bridge the gap between genetic variation and phenotypes. To maximize the potential of such studies, efficient methods to identify expression quantitative trait loci (eQTLs) and perform fine-mapping and genetic prediction of gene expression traits are needed. Current methods that leverage both total read counts and allele-specific expression to identify eQTLs are generally computationally intractable for large transcriptomic studies. Here, we describe a unified framework that addresses these needs and is scalable to thousands of samples. Using simulations and data from GTEx, we demonstrate its calibration and performance. For example, mixQTL shows a power gain equivalent to a 29% increase in sample size for genes with sufficient allele-specific read coverage. To showcase the potential of mixQTL, we apply it to 49 GTEx tissues and find 20% additional eQTLs (FDR < 0.05, per tissue) that are significantly more enriched among trait associated variants and candidate cis-regulatory elements comparing to the standard approach.


2019 ◽  
Author(s):  
Jiaxin Fan ◽  
Jian Hu ◽  
Chenyi Xue ◽  
Hanrui Zhang ◽  
Muredach P. Reilly ◽  
...  

ABSTRACTAllele-specific expression (ASE) analysis, which quantifies the relative expression of two alleles in a diploid individual, is a powerful tool for identifying cis-regulated gene expression variations that underlie phenotypic differences among individuals. Existing methods for gene-level ASE detection analyze one individual at a time, therefore wasting shared information across individuals. Failure to accommodate such shared information not only loses power, but also makes it difficult to interpret results across individuals. However, ASE detection across individuals is challenging because the data often include individuals that are either heterozygous or homozygous for the unobserved cis-regulatory SNP, leading to heterogeneity in ASE as only those heterozygous individuals are informative for ASE, whereas those homozygous individuals have balanced expression. To simultaneously model multi-individual information and account for such heterogeneity, we developed ASEP, a mixture model with subject-specific random effect accounting for multi-SNP correlations within the same gene. ASEP is able to detect gene-level ASE under one condition and differential ASE between two conditions (e.g., pre-versus post-treatment). Extensive simulations have demonstrated the convincing performance of ASEP under a wide range of scenarios. We further applied ASEP to RNA-seq data of human macrophages, and identified genes showing evidence of differential ASE pre-versus post-stimulation, which were extended through findings in cardiometabolic trait-relevant genome-wide association studies. To the best of our knowledge, ASEP is the first method for gene-level ASE detection at the population level. With the growing adoption of RNA-seq, we believe ASEP will be well-suited for various ASE studies for human diseases.


2019 ◽  
Author(s):  
Anna Hutchinson ◽  
Hope Watson ◽  
Chris Wallace

AbstractGenome Wide Association Studies (GWAS) have successfully identified thousands of loci associated with human diseases. Bayesian genetic fine-mapping studies aim to identify the specific causal variants within GWAS loci responsible for each association, reporting credible sets of plausible causal variants, which are interpreted as containing the causal variant with some “coverage probability”.Here, we use simulations to demonstrate that the coverage probabilities are over-conservative in most fine-mapping situations. We show that this is because fine-mapping data sets are not randomly selected from amongst all causal variants, but from amongst causal variants with larger effect sizes. We present a method to re-estimate the coverage of credible sets using rapid simulations based on the observed, or estimated, SNP correlation structure, we call this the “corrected coverage estimate”. This is extended to find “corrected credible sets”, which are the smallest set of variants such that their corrected coverage estimate meets the target coverage.We use our method to improve the resolution of a fine-mapping study of type 1 diabetes. We found that in 27 out of 39 associated genomic regions our method could reduce the number of potentially causal variants to consider for follow-up, and found that none of the 95% or 99% credible sets required the inclusion of more variants – a pattern matched in simulations of well powered GWAS.Crucially, our correction method requires only GWAS summary statistics and remains accurate when SNP correlations are estimated from a large reference panel. Using our method to improve the resolution of fine-mapping studies will enable more efficient expenditure of resources in the follow-up process of annotating the variants in the credible set to determine the implicated genes and pathways in human diseases.Author summaryPinpointing specific genetic variants within the genome that are causal for human diseases is difficult due to complex correlation patterns existing between variants. Consequently, researchers typically prioritise a set of plausible causal variants for functional validation - these sets of putative causal variants are called “credible sets”. We find that the probabilistic interpretation that these credible sets do indeed contain the true causal variant is variable, in that the reported probabilities often underestimate the true coverage of the causal variant in the credible set. We have developed a method to provide researchers with a “corrected coverage estimate” that the true causal variant appears in the credible set, and this has been extended to find “corrected credible sets”, allowing for more efficient allocation of resources in the expensive follow-up laboratory experiments. We used our method to reduce the number of genetic variants to consider as causal candidates for follow-up in 27 genomic regions that are associated with type 1 diabetes.


2018 ◽  
Author(s):  
Jennifer Zou ◽  
Farhad Hormozdiari ◽  
Brandon Jew ◽  
Jason Ernst ◽  
Jae Hoon Sul ◽  
...  

AbstractMany disease risk loci identified in genome-wide association studies are present in non-coding regions of the genome. It is hypothesized that these variants affect complex traits by acting as expression quantitative trait loci (eQTLs) that influence expression of nearby genes. This indicates that many causal variants for complex traits are likely to be causal variants for gene expression. Hence, identifying causal variants for gene expression is important for elucidating the genetic basis of not only gene expression but also complex traits. However, detecting causal variants is challenging due to complex genetic correlation among variants known as linkage disequilibrium (LD) and the presence of multiple causal variants within a locus. Although several fine-mapping approaches have been developed to overcome these challenges, they may produce large sets of putative causal variants when true causal variants are in high LD with many non-causal variants. In eQTL studies, there is an additional source of information that can be used to improve fine-mapping called allele-specific expression (ASE) that measures imbalance in gene expression due to different alleles. In this work, we develop a novel statistical method that leverages both ASE and eQTL information to detect causal variants that regulate gene expression. We illustrate through simulations and application to the Genotype-Tissue Expression (GTEx) dataset that our method identifies the true causal variants with higher specificity than an approach that uses only eQTL information. In the GTEx dataset, our method achieves the median reduction rate of 11% in the number of putative causal [email protected], [email protected]


2020 ◽  
Vol 106 (2) ◽  
pp. 170-187 ◽  
Author(s):  
Austin T. Wang ◽  
Anamay Shetty ◽  
Edward O’Connor ◽  
Connor Bell ◽  
Mark M. Pomerantz ◽  
...  

Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 311
Author(s):  
Zhenqiu Liu

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
M. Joseph Tomlinson ◽  
Shawn W. Polson ◽  
Jing Qiu ◽  
Juniper A. Lake ◽  
William Lee ◽  
...  

AbstractDifferential abundance of allelic transcripts in a diploid organism, commonly referred to as allele specific expression (ASE), is a biologically significant phenomenon and can be examined using single nucleotide polymorphisms (SNPs) from RNA-seq. Quantifying ASE aids in our ability to identify and understand cis-regulatory mechanisms that influence gene expression, and thereby assist in identifying causal mutations. This study examines ASE in breast muscle, abdominal fat, and liver of commercial broiler chickens using variants called from a large sub-set of the samples (n = 68). ASE analysis was performed using a custom software called VCF ASE Detection Tool (VADT), which detects ASE of biallelic SNPs using a binomial test. On average ~ 174,000 SNPs in each tissue passed our filtering criteria and were considered informative, of which ~ 24,000 (~ 14%) showed ASE. Of all ASE SNPs, only 3.7% exhibited ASE in all three tissues, with ~ 83% showing ASE specific to a single tissue. When ASE genes (genes containing ASE SNPs) were compared between tissues, the overlap among all three tissues increased to 20.1%. Our results indicate that ASE genes show tissue-specific enrichment patterns, but all three tissues showed enrichment for pathways involved in translation.


Sign in / Sign up

Export Citation Format

Share Document