PhenomeXcan: Mapping the genome to the phenome through the transcriptome

Mapping Intimacies ◽

10.1101/833210 ◽

2019 ◽

Cited By ~ 6

Author(s):

Milton Pividori ◽

Padma S. Rajagopal ◽

Alvaro Barbeira ◽

Yanyu Liang ◽

Owen Melia ◽

...

Keyword(s):

Complex Traits ◽

Large Scale ◽

Target Genes ◽

Genome Wide Association Study ◽

Tissue Expression ◽

P Value ◽

Entire Genome ◽

User Friendly ◽

Biological Context ◽

Trait Associations

AbstractLarge-scale genomic and transcriptomic initiatives offer unprecedented ability to study the biology of complex traits and identify target genes for precision prevention or therapy. Translation to clinical contexts, however, has been slow and challenging due to lack of biological context for identified variant-level associations. Moreover, many translational researchers lack the computational or analytic infrastructures required to fully use these resources. We integrate genome-wide association study (GWAS) summary statistics from multiple publicly available sources and data from Genotype-Tissue Expression (GTEx) v8 using PrediXcan and provide a user-friendly platform for translational researchers based on state-of-the-art algorithms. We develop a novel Bayesian colocalization method, fastENLOC, to prioritize the most likely causal gene-trait associations. Our resource, PhenomeXcan, synthesizes 8.87 million variants from GWAS on 4,091 traits with transcriptome regulation data from 49 tissues in GTEx v8 into an innovative, gene-based resource including 22,255 genes. Across the entire genome/phenome space, we find 65,603 significant associations (Bonferroni-corrected p-value of 5.5 × 10−10), where 19,579 (29.8 percent) were colocalized (locus regional colocalization probability > 0.1). We successfully replicate associations from PheWAS Catalog (AUC=0.61) and OMIM (AUC=0.64). We provide examples of (a) finding novel and underreported genome-to-phenome associations, (b) exploring complex gene-trait clusters within PhenomeXcan, (c) studying phenome-to-phenome relationships between common and rare diseases via further integration of PhenomeXcan with ClinVar, and (d) evaluating potential therapeutic targets. PhenomeXcan (phenomexcan.org) broadens access to complex genomic and transcriptomic data and empowers translational researchers.One-Sentence SummaryPhenomeXcan is a gene-based resource of gene-trait associations with biological context that supports translational research.

Download Full-text

PhenomeXcan: Mapping the genome to the phenome through the transcriptome

Science Advances ◽

10.1126/sciadv.aba2083 ◽

2020 ◽

Vol 6 (37) ◽

pp. eaba2083 ◽

Cited By ~ 3

Author(s):

Milton Pividori ◽

Padma S. Rajagopal ◽

Alvaro Barbeira ◽

Yanyu Liang ◽

Owen Melia ◽

...

Keyword(s):

Complex Traits ◽

Large Scale ◽

Genome Wide Association Study ◽

Association Studies ◽

Gene List ◽

Tissue Expression ◽

Mendelian Inheritance ◽

Complex Data ◽

Causal Genes ◽

User Friendly

Large-scale genomic and transcriptomic initiatives offer unprecedented insight into complex traits, but clinical translation remains limited by variant-level associations without biological context and lack of analytic resources. Our resource, PhenomeXcan, synthesizes 8.87 million variants from genome-wide association study summary statistics on 4091 traits with transcriptomic data from 49 tissues in Genotype-Tissue Expression v8 into a gene-based, queryable platform including 22,515 genes. We developed a novel Bayesian colocalization method, fast enrichment estimation aided colocalization analysis (fastENLOC), to prioritize likely causal gene-trait associations. We successfully replicate associations from the phenome-wide association studies (PheWAS) catalog Online Mendelian Inheritance in Man, and an evidence-based curated gene list. Using PhenomeXcan results, we provide examples of novel and underreported genome-to-phenome associations, complex gene-trait clusters, shared causal genes between common and rare diseases via further integration of PhenomeXcan with ClinVar, and potential therapeutic targets. PhenomeXcan (phenomexcan.org) provides broad, user-friendly access to complex data for translational researchers.

Download Full-text

Rare coding variants in 35 genes associate with circulating lipid levels – a multi-ancestry analysis of 170,000 exomes

10.1101/2020.12.22.423783 ◽

2020 ◽

Author(s):

George Hindy ◽

Peter Dornbos ◽

Mark D. Chaffin ◽

Dajiang J. Liu ◽

Minxian Wang ◽

...

Keyword(s):

Complex Traits ◽

Drug Target ◽

Large Scale ◽

Target Genes ◽

Genome Wide Association Study ◽

Sequence Data ◽

Blood Lipid ◽

Nucleotide Polymorphisms ◽

Lipid Levels ◽

Blood Lipid Levels

SummaryLarge-scale gene sequencing studies for complex traits have the potential to identify causal genes with therapeutic implications. We performed gene-based association testing of blood lipid levels with rare (minor allele frequency<1%) predicted damaging coding variation using sequence data from >170,000 individuals from multiple ancestries: 97,493 European, 30,025 South Asian, 16,507 African, 16,440 Hispanic/Latino, 10,420 East Asian, and 1,182 Samoan. We identified 35 genes associated with circulating lipid levels. Ten of these: ALB, SRSF2, JAK2, CREB3L3, TMEM136, VARS, NR1H3, PLA2G12A, PPARG and STAB1 have not been implicated for lipid levels using rare coding variation in population-based samples. We prioritize 32 genes identified in array-based genome-wide association study (GWAS) loci based on gene-based associations, of which three: EVI5, SH2B3, and PLIN1, had no prior evidence of rare coding variant associations. Most of the associated genes showed evidence of association in multiple ancestries. Also, we observed an enrichment of gene-based associations for low-density lipoprotein cholesterol drug target genes, and for genes closest to GWAS index single nucleotide polymorphisms (SNP). Our results demonstrate that gene-based associations can be beneficial for drug target development and provide evidence that the gene closest to the array-based GWAS index SNP is often the functional gene for blood lipid levels.

Download Full-text

Exploiting the GTEx resources to decipher the mechanisms at GWAS loci

Genome Biology ◽

10.1186/s13059-020-02252-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Alvaro N. Barbeira ◽

◽

Rodrigo Bonazzola ◽

Eric R. Gamazon ◽

Yanyu Liang ◽

...

Keyword(s):

Complex Traits ◽

Target Genes ◽

Genome Wide Association Study ◽

Data Driven ◽

Functional Interpretation ◽

Transcriptome Regulation ◽

Genome Wide ◽

Causal Genes ◽

Dose Dependent ◽

Single Approach

AbstractThe resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined.

Download Full-text

Genome-wide association study revealed loci linked to post-drought recovery and traits related to persistence of smooth bromegrass (Bromus inermis)

10.1101/2021.05.16.444329 ◽

2021 ◽

Author(s):

Fatemeh Saeidnia ◽

Mohammad Mahdi Majidi ◽

Aghafakhr Mirlohi

Keyword(s):

Water Stress ◽

Association Analysis ◽

Complex Traits ◽

Genome Wide Association Study ◽

Agronomic Traits ◽

Genetic Dissection ◽

Smooth Bromegrass ◽

Population Structure Analysis ◽

Stress Environments ◽

Trait Associations

Association analysis has been proved as a powerful tool for genetic dissection of complex traits. This study was conducted to identify marker – trait associations for recovery, persistence, and as well as finding stable associations. In this study, a diverse panel of polycross derived progenies of smooth bromegrass was phenotyped under normal and water stress, during three consecutive years. Association analysis was performed between nine important agronomic traits along with three seasonal growth activity indices based on 535 SRAP markers. Population structure analysis identified five main subpopulations possessing significant genetic differences. Association analysis using mixed linear mode1 identified 339 and 233 marker-trait associations under normal and water stress environments, respectively. Some of these markers were associated with more than one trait; which can be attributed to pleiotropic effects or to a number of tightly linked genes affecting several traits. If the effectiveness of these markers in genetic control of these traits is validated, they could be potentially used for initiation of marker-assisted selection and targeted trait introgression of smooth bromegrass under normal and water stress environments.

Download Full-text

Integrative transcriptome imputation reveals tissue-specific and shared biological mechanisms mediating susceptibility to complex traits

10.1101/532929 ◽

2019 ◽

Cited By ~ 1

Author(s):

Wen Zhang ◽

Georgios Voloudakis ◽

Veera M. Rajagopal ◽

Ben Reahead ◽

Joel T. Dudley ◽

...

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Large Scale ◽

Association Studies ◽

Drug Repurposing ◽

Joint Analysis ◽

Expression Trait ◽

Significant Expression ◽

Insight Into ◽

Trait Associations

AbstractTranscriptome-wide association studies integrate gene expression data with common risk variation to identify gene-trait associations. By incorporating epigenome data to estimate the functional importance of genetic variation on gene expression, we improve the accuracy of transcriptome prediction and the power to detect significant expression-trait associations. Joint analysis of 14 large-scale transcriptome datasets and 58 traits identify 13,724 significant expression-trait associations that converge to biological processes and relevant phenotypes in human and mouse phenotype databases. We perform drug repurposing analysis and identify known and novel compounds that mimic or reverse trait-specific changes. We identify genes that exhibit agonistic pleiotropy for genetically correlated traits that converge on shared biological pathways and elucidate distinct processes in disease etiopathogenesis. Overall, this comprehensive analysis provides insight into the specificity and convergence of gene expression on susceptibility to complex traits.

Download Full-text

PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics

10.1101/148627 ◽

2017 ◽

Cited By ~ 4

Author(s):

Jie Zheng ◽

Tom G. Richardson ◽

Louise A. C. Millard ◽

Gibran Hemani ◽

Christopher Raistrick ◽

...

Keyword(s):

Complex Traits ◽

Multiple Testing ◽

Large Scale ◽

Genome Wide Association Study ◽

Phenotypic Correlation ◽

Summary Statistics ◽

Multiple Testing Correction ◽

Individual Level ◽

Phenotypic Correlations ◽

Complex Human Traits

AbstractBackgroundIdentifying phenotypic correlations between complex traits and diseases can provide useful etiological insights. Restricted access to individual-level phenotype data makes it difficult to estimate large-scale phenotypic correlation across the human phenome. State-of-the-art methods, metaCCA and LD score regression, provide an alternative approach to estimate phenotypic correlation using genome-wide association study (GWAS) summary statistics.ResultsHere, we present an integrated R toolkit, PhenoSpD, to 1) apply metaCCA (or LD score regression) to estimate phenotypic correlations using GWAS summary statistics; and 2) to utilize the estimated phenotypic correlations to inform correction of multiple testing for complex human traits using the spectral decomposition of matrices (SpD). The simulations suggest it is possible to estimate phenotypic correlation using samples with only a partial overlap, but as overlap decreases correlations will attenuate towards zero and multiple testing correction will be more stringent than in perfectly overlapping samples. In a case study, PhenoSpD using GWAS results suggested 324.4 independent tests among 452 metabolites, which is close to the 296 independent tests estimated using true phenotypic correlation. We further applied PhenoSpD to estimated 7,503 pair-wise phenotypic correlations among 123 metabolites using GWAS summary statistics from Kettunen et al. and PhenoSpD suggested 44.9 number of independent tests for theses metabolites.ConclusionPhenoSpD integrates existing methods and provides a simple and conservative way to reduce dimensionality for complex human traits using GWAS summary statistics, which is particularly valuable for post-GWAS analysis of complex molecular traits.AvailabilityR code and documentation for PhenoSpD V1.0.0 is available online (https://github.com/MRCIEU/PhenoSpD).

Download Full-text

Comprehensive comparison of large-scale tissue expression datasets

10.7287/peerj.preprints.1072v1 ◽

2015 ◽

Cited By ~ 1

Author(s):

Alberto Santos ◽

Kalliopi Tsafou ◽

Christian Stolte ◽

Sune Pletscher-Frankild ◽

Seán I O’Donoghue ◽

...

Keyword(s):

Large Scale ◽

Comprehensive Evaluation ◽

Tissue Expression ◽

Expression Data ◽

Web Interface ◽

Comprehensive Comparison ◽

Single User ◽

The Right ◽

User Friendly ◽

Literature Curation

For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression. By developing comparable confidence scores for all types of evidence, we show that it is possible to improve both quality and coverage by combining the datasets. To facilitate use and visualization of our work, we have developed the TISSUES resource ( http://tissues.jensenlab.org ), which makes all the scored and integrated data available through a single user-friendly web interface.

Download Full-text

Five genes influenced by obesity may contribute to the development of thyroid cancer through the regulation of insulin levels

PeerJ ◽

10.7717/peerj.9302 ◽

2020 ◽

Vol 8 ◽

pp. e9302

Author(s):

Jiaming Chen ◽

Hongbao Cao ◽

Meng Lian ◽

Jugao Fang

Keyword(s):

Data Mining ◽

Thyroid Cancer ◽

Pathway Analysis ◽

Large Scale ◽

Target Genes ◽

Underlying Mechanism ◽

P Value ◽

Healthy Controls ◽

Functional Profile ◽

Obesity Genes

Previous studies indicate that obesity is an important contributor to the proceeding of thyroid cancer (TC) with limited knowledge of the underlying mechanism. Here, we hypothesize that molecules affected by obesity may play roles in the development of TC. To test the hypothesis above, we first conducted a large-scale literature-based data mining to identify genes influenced by obesity and genes related to TC. Then, a mega-analysis was conducted to study the expression changes of the obesity-specific genes in the case of TC, using 16 independent TC array-expression datasets (783 TC cases and 439 healthy controls). After that, pathway analysis was performed to explore the functional profile of the selected target genes and their potential connections with TC. We identified 1,036 genes associated with TC and 534 regulated by obesity, demonstrating a significant overlap (N = 176, p-value = 4.07e−112). Five out of the 358 obesity-specific genes, FABP4, CFD, GHR, TNFRSF11B, and LTF, presented significantly decreased expression in TC patients (LFC<−1.44; and p-value < 1e−7). Multiple literature-based pathways were identified where obesity could promote the pathologic development of TC through the regulation of these five genes and INS levels. The five obesity genes uncovered could be novel genes that play roles in the etiology of TC through the modulation of INS levels.

Download Full-text

EnhancerAtlas 2.0: an updated resource with enhancer annotation in 586 tissue/cell types across nine species

Nucleic Acids Research ◽

10.1093/nar/gkz980 ◽

2019 ◽

Cited By ~ 8

Author(s):

Tianshun Gao ◽

Jiang Qian

Keyword(s):

Large Scale ◽

Target Gene ◽

Target Genes ◽

Cell Types ◽

Regulatory Elements ◽

Tissue Cell ◽

Normal Tissues ◽

Genome Wide ◽

Wide Range ◽

User Friendly

Abstract Enhancers are distal cis-regulatory elements that activate the transcription of their target genes. They regulate a wide range of important biological functions and processes, including embryogenesis, development, and homeostasis. As more and more large-scale technologies were developed for enhancer identification, a comprehensive database is highly desirable for enhancer annotation based on various genome-wide profiling datasets across different species. Here, we present an updated database EnhancerAtlas 2.0 (http://www.enhanceratlas.org/indexv2.php), covering 586 tissue/cell types that include a large number of normal tissues, cancer cell lines, and cells at different development stages across nine species. Overall, the database contains 13 494 603 enhancers, which were obtained from 16 055 datasets using 12 high-throughput experiment methods (e.g. H3K4me1/H3K27ac, DNase-seq/ATAC-seq, P300, POLR2A, CAGE, ChIA-PET, GRO-seq, STARR-seq and MPRA). The updated version is a huge expansion of the first version, which only contains the enhancers in human cells. In addition, we predicted enhancer–target gene relationships in human, mouse and fly. Finally, the users can search enhancers and enhancer–target gene relationships through five user-friendly, interactive modules. We believe the new annotation of enhancers in EnhancerAtlas 2.0 will facilitate users to perform useful functional analysis of enhancers in various genomes.

Download Full-text

A tissue-specific collaborative mixed model for jointly analyzing multiple tissues in transcriptome-wide association studies

10.1101/789396 ◽

2019 ◽

Cited By ~ 2

Author(s):

Xingjie Shi ◽

Xiaoran Chai ◽

Yi Yang ◽

Qing Cheng ◽

Yuling Jiao ◽

...

Keyword(s):

Complex Traits ◽

Mixed Model ◽

Target Genes ◽

Association Studies ◽

Specific Role ◽

Tissue Specific ◽

Candidate Target ◽

Different Tissues ◽

Trait Associations

AbstractTranscriptome-wide association studies (TWAS) integrate expression quantitative trait loci (eQTLs) studies with genome-wide association studies (GWASs) to prioritize candidate target genes for complex traits. Several statistical methods have been recently proposed to improve the performance of TWAS in gene prioritization by integrating the expression regulatory information imputed from multiple tissues, and made significant achievements in improving the ability to detect gene-trait associations. The major limitation of these methods is that they cannot be used to elucidate the specific functional effects of candidate genes across different tissues. Here, we propose a tissue-specific collaborative mixed model (TisCoMM) for TWAS, leveraging the co-regulation of genetic variations across different tissues explicitly via a unified probabilistic model. TisCoMM not only performs hypothesis testing to prioritize gene-trait associations, but also detects the tissue-specific role of candidate target genes in complex traits. To make use of widely available GWAS summary statistics, we extend TisCoMM to use summary-level data, namely, TisCoMM-S2. Using extensive simulation studies, we show that type I error is controlled at the nominal level, the statistical power of identifying associated genes is greatly improved, and false positive rate (FPR) for non-causal tissues is well controlled at decent levels. We further illustrate the benefits of our methods in applications to summary-level GWAS data of 33 complex traits. Notably, apart from better identifying potential trait-associated genes, we can elucidate the tissue-specific role of candidate target genes. The follow-up pathway analysis from tissue-specific genes for asthma shows that the immune system plays an essential function for asthma development in both thyroid and lung tissues.

Download Full-text