scholarly journals Accurate eQTL prioritization with an ensemble-based framework

2016 ◽  
Author(s):  
Haoyang Zeng ◽  
Matthew D. Edwards ◽  
Yuchun Guo ◽  
David K. Gifford

AbstractExpression quantitative trait loci (eQTL) analysis links sequence variants with gene expression change and serves as a successful approach to fine-map variants causal for complex traits and understand their pathogenesis. In this work, we present an ensemble-based computational framework, EnsembleExpr, for eQTL prioritization. When trained on data from massively parallel reporter assays (MPRA), EnsembleExpr accurately predicts reporter expression levels from DNA sequence and identifies sequence variants that exhibit significant allele-specific reporter expression. This framework achieved the best performance in the “eQTL-causal SNPs” open challenge in the Fourth Critical Assessment of Genome Interpretation (CAGI 4). We envision EnsembleExpr to be a powerful resource for interpreting non-coding regulatory variants and prioritizing disease-associated mutations for downstream validation.

2021 ◽  
Author(s):  
Kousuke Mouri ◽  
Michael H. Guo ◽  
Carl G. de Boer ◽  
Greg A. Newby ◽  
Matteo Gentili ◽  
...  

Genome-wide association studies have uncovered hundreds of autoimmune disease-associated loci; however, the causal genetic variant(s) within each locus are mostly unknown. Here, we perform high-throughput allele-specific reporter assays to prioritize disease-associated variants for five autoimmune diseases. By examining variants that both promote allele-specific reporter expression and are located in accessible chromatin, we identify 60 putatively causal variants that enrich for statistically fine-mapped variants by up to 57.8-fold. We introduced the risk allele of a prioritized variant (rs72928038) into a human T cell line and deleted the orthologous sequence in mice, both resulting in reduced BACH2 expression. Naive CD8 T cells from mice containing the deletion had reduced expression of genes that suppress activation and maintain stemness. Our results represent an example of an effective approach for prioritizing variants and studying their physiologically relevant effects.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Yuhua Zhang ◽  
◽  
Corbin Quick ◽  
Ketian Yu ◽  
Alvaro Barbeira ◽  
...  

Abstract We propose a new computational framework, probabilistic transcriptome-wide association study (PTWAS), to investigate causal relationships between gene expressions and complex traits. PTWAS applies the established principles from instrumental variables analysis and takes advantage of probabilistic eQTL annotations to delineate and tackle the unique challenges arising in TWAS. PTWAS not only confers higher power than the existing methods but also provides novel functionalities to evaluate the causal assumptions and estimate tissue- or cell-type-specific gene-to-trait effects. We illustrate the power of PTWAS by analyzing the eQTL data across 49 tissues from GTEx (v8) and GWAS summary statistics from 114 complex traits.


BMC Genomics ◽  
2014 ◽  
Vol 15 (1) ◽  
pp. 471 ◽  
Author(s):  
Yehudit Hasin-Brumshtein ◽  
Farhad Hormozdiari ◽  
Lisa Martin ◽  
Atila van Nas ◽  
Eleazar Eskin ◽  
...  

2014 ◽  
Author(s):  
Gregory A Moyerbrailean ◽  
Chris T Harvey ◽  
Cynthia A Kalita ◽  
Xiaoquan Wen ◽  
Francesca Luca ◽  
...  

Ongoing large experimental characterization is crucial to determine all regulatory sequences, yet we do not know which genetic variants in those regions are non-silent. Here, we present a novel analysis integrating sequence and DNase I footprinting data for 653 samples to predict the impact of a sequence change on transcription factor binding for a panel of 1,372 motifs. Most genetic variants in footprints (5,810,227) do not show evidence of allele-specific binding (ASB). In contrast, functional genetic variants predicted by our computational models are highly enriched for ASB (3,217 SNPs at 20% FDR). Comparing silent to functional non-coding genetic variants, the latter are 1.22-fold enriched for GWAS traits, have lower allele frequencies, and affect footprints more distal to promoters or active in fewer tissues. Finally, integration of the annotations into 18 GWAS meta-studies improves identification of likely causal SNPs and transcription factors relevant for complex traits.


2019 ◽  
Author(s):  
Daniel Esposito ◽  
Jochen Weile ◽  
Jay Shendure ◽  
Lea M Starita ◽  
Anthony T Papenfuss ◽  
...  

AbstractMultiplex Assays of Variant Effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here we present MaveDB, a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first of these applications, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.


2021 ◽  
Author(s):  
Hao Lu ◽  
Luyu Ma ◽  
Lei Li ◽  
Cheng Quan ◽  
Yiming Lu ◽  
...  

Noncoding genomic variants constitute the majority of trait-associated genome variations; however, identification of functional noncoding variants is still a challenge in human genetics, and a method systematically assessing the impact of regulatory variants on gene expression and linking them to potential target genes is still lacking. Here we introduce a deep neural network (DNN)-based computational framework, RegVar, that can accurately predict the tissue-specific impact of noncoding regulatory variants on target genes. We show that, by robustly learning the genomic characteristics of massive variant-gene expression associations in a variety of human tissues, RegVar vastly surpasses all current noncoding variants prioritization methods in predicting regulatory variants under different circumstances. The unique features of RegVar make it an excellent framework for assessing the regulatory impact of any variant on its putative target genes in a variety of tissues. RegVar is available as a webserver at http://regvar.cbportal.org/.


2021 ◽  
Author(s):  
Matteo D'Antonio ◽  
Timothy D. Arthur ◽  
Jennifer P. Nguyen ◽  
Hiroko Matsui ◽  
Agnieszka D'Antonio-Chronowska ◽  
...  

The causal variants and genes underlying thousands of cardiac GWAS signals have yet to be identified. To address this issue, we leveraged spatiotemporal information on 966 RNA-seq cardiac samples and performed an expression quantitative trait locus (eQTL) analysis detecting ~26,000 eQTL signals associated with more than 11,000 eGenes and 7,000 eIsoforms. Approximately 2,500 eQTLs were associated with specific cardiac stages, organs, tissues and/or cell types. Colocalization and fine mapping of eQTL and GWAS signals of five cardiac traits in the UK BioBank identified variants with high posterior probabilities for being causal in 210 GWAS loci. Over 50 of these loci represent novel functionally annotated cardiac GWAS signals. Our study provides a comprehensive resource mapping regulatory variants that function in spatiotemporal context-specific manners to regulate cardiac gene expression, which can be used to functionally annotate genomic loci associated with cardiac traits and disease.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Fei Zhang ◽  
Jinfeng Wu ◽  
Nir Sade ◽  
Si Wu ◽  
Aiman Egbaria ◽  
...  

Abstract Background Drought is a major environmental disaster that causes crop yield loss worldwide. Metabolites are involved in various environmental stress responses of plants. However, the genetic control of metabolomes underlying crop environmental stress adaptation remains elusive. Results Here, we perform non-targeted metabolic profiling of leaves for 385 maize natural inbred lines grown under well-watered as well as drought-stressed conditions. A total of 3890 metabolites are identified and 1035 of these are differentially produced between well-watered and drought-stressed conditions, representing effective indicators of maize drought response and tolerance. Genetic dissections reveal the associations between these metabolites and thousands of single-nucleotide polymorphisms (SNPs), which represented 3415 metabolite quantitative trait loci (mQTLs) and 2589 candidate genes. 78.6% of mQTLs (2684/3415) are novel drought-responsive QTLs. The regulatory variants that control the expression of the candidate genes are revealed by expression QTL (eQTL) analysis of the transcriptomes of leaves from 197 maize natural inbred lines. Integrated metabolic and transcriptomic assays identify dozens of environment-specific hub genes and their gene-metabolite regulatory networks. Comprehensive genetic and molecular studies reveal the roles and mechanisms of two hub genes, Bx12 and ZmGLK44, in regulating maize metabolite biosynthesis and drought tolerance. Conclusion Our studies reveal the first population-level metabolomes in crop drought response and uncover the natural variations and genetic control of these metabolomes underlying crop drought adaptation, demonstrating that multi-omics is a powerful strategy to dissect the genetic mechanisms of crop complex traits.


Sign in / Sign up

Export Citation Format

Share Document