Leveraging functional annotation to identify genes associated with complex diseases

Wei Liu; Mo Li; Wenfeng Zhang; Geyu Zhou; Xing Wu; Jiawei Wang; Qiongshi Lu; Hongyu Zhao

doi:10.1371/journal.pcbi.1008315

Leveraging functional annotation to identify genes associated with complex diseases

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008315 ◽

2020 ◽

Vol 16 (11) ◽

pp. e1008315

Author(s):

Wei Liu ◽

Mo Li ◽

Wenfeng Zhang ◽

Geyu Zhou ◽

Xing Wu ◽

...

Keyword(s):

Gene Expression ◽

Linkage Disequilibrium ◽

Quantitative Trait Loci ◽

Quantitative Trait ◽

Complex Traits ◽

Functional Annotation ◽

Statistical Power ◽

Late Onset ◽

Expression Levels ◽

Disease Associated Genes

To increase statistical power to identify genes associated with complex traits, a number of transcriptome-wide association study (TWAS) methods have been proposed using gene expression as a mediating trait linking genetic variations and diseases. These methods first predict expression levels based on inferred expression quantitative trait loci (eQTLs) and then identify expression-mediated genetic effects on diseases by associating phenotypes with predicted expression levels. The success of these methods critically depends on the identification of eQTLs, which may not be functional in the corresponding tissue, due to linkage disequilibrium (LD) and the correlation of gene expression between tissues. Here, we introduce a new method called T-GEN (Transcriptome-mediated identification of disease-associated Genes with Epigenetic aNnotation) to identify disease-associated genes leveraging epigenetic information. Through prioritizing SNPs with tissue-specific epigenetic annotation, T-GEN can better identify SNPs that are both statistically predictive and biologically functional. We found that a significantly higher percentage (an increase of 18.7% to 47.2%) of eQTLs identified by T-GEN are inferred to be functional by ChromHMM and more are deleterious based on their Combined Annotation Dependent Depletion (CADD) scores. Applying T-GEN to 207 complex traits, we were able to identify more trait-associated genes (ranging from 7.7% to 102%) than those from existing methods. Among the identified genes associated with these traits, T-GEN can better identify genes with high (>0.99) pLI scores compared to other methods. When T-GEN was applied to late-onset Alzheimer’s disease, we identified 96 genes located at 15 loci, including two novel loci not implicated in previous GWAS. We further replicated 50 genes in an independent GWAS, including one of the two novel loci.

Download Full-text

Leveraging functional annotation to identify genes associated with complex diseases

10.1101/529297 ◽

2019 ◽

Author(s):

Wei Liu ◽

Mo Li ◽

Wenfeng Zhang ◽

Geyu Zhou ◽

Xing Wu ◽

...

Keyword(s):

Gene Expression ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Complex Traits ◽

Statistical Power ◽

Late Onset ◽

Disease Etiology ◽

Expression Levels ◽

Epigenetic Information ◽

Disease Associated Genes

AbstractTo increase statistical power to identify genes associated with complex traits, a number of transcriptome-wide association study (TWAS) methods have been proposed using gene expression as a mediating trait linking genetic variations and diseases. These methods first predict expression levels based on inferred expression quantitative trait loci (eQTLs) and then identify expression-mediated genetic effects on diseases by associating phenotypes with predicted expression levels. The success of these methods critically depends on the identification of eQTLs, which may not be functional in the corresponding tissue, due to linkage disequilibrium (LD) and the correlation of gene expression between tissues. Here, we introduce a new method called T-GEN (Transcriptome-mediated identification of disease-associatedGens withEpigenetic aNnotation) to identify disease-associated genes leveraging epigenetic information. Through prioritizing SNPs with tissue-specific epigenetic annotation, T-GEN can better identify SNPs that are both statistically predictive and biologically functional. We found that a significantly higher percentage (an increase of 18.7% to 47.2%) of eQTLs identified by T-GEN are inferred to be functional by ChromHMM and more are deleterious based on their Combined Annotation Dependent Depletion (CADD) scores. Applying T-GEN to 207 complex traits, we were able to identify more trait-associated genes (ranging from 7.7 % to 102%) than those from existing methods. Among the identified genes associated with these traits, T-GEN can better identify genes with high (>0.99) pLI scores compared to other methods. When T-GEN was applied to late-onset Alzheimer’s disease, we identified 96 genes located at 15 loci, including two novel loci not implicated in previous GWAS. We further replicated 50 genes in an independent GWAS, including one of the two novel loci.Author summaryTWAS-like methods have been widely applied to understand disease etiology using eQTL data and GWAS results. However, it is still challenging to discriminate the true disease-associated genes from those in strong LD with true genes, which is largely due to the misidentification of eQTLs. Here we introduce a novel statistical method named T-GEN to identify disease-associated genes considering epigenetic information. Compared to current TWAS methods, T-GEN can not only identify eQTLs with higher CADD scores and function potentials in gene-expression imputation models, but also identify more disease-associated genes across 207 traits and more genes with high (>0.99) pLI scores. Applying T-GEN in late-onset Alzheimer’s disease identified 96 genes at 15 loci with two novel loci. Among 96 identified genes, 50 genes were further replicated in an independent GWAS.

Download Full-text

Leveraging DNA-Methylation Quantitative-Trait Loci to Characterize the Relationship between Methylomic Variation, Gene Expression, and Complex Traits

The American Journal of Human Genetics ◽

10.1016/j.ajhg.2018.09.007 ◽

2018 ◽

Vol 103 (5) ◽

pp. 654-665 ◽

Cited By ~ 29

Author(s):

Eilis Hannon ◽

Tyler J. Gorrie-Stone ◽

Melissa C. Smart ◽

Joe Burrage ◽

Amanda Hughes ◽

...

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Quantitative Trait Loci ◽

Quantitative Trait ◽

Complex Traits ◽

Trait Loci ◽

The Relationship ◽

Methylation Quantitative Trait Loci

Download Full-text

Leveraging DNA methylation quantitative trait loci to characterize the relationship between methylomic variation, gene expression and complex traits

10.1101/297176 ◽

2018 ◽

Author(s):

Eilis Hannon ◽

Tyler J Gorrie-Stone ◽

Melissa C Smart ◽

Joe Burrage ◽

Amanda Hughes ◽

...

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Genetic Variation ◽

Quantitative Trait Loci ◽

Quantitative Trait ◽

Complex Traits ◽

Common Genetic Variation ◽

Online Data ◽

The Relationship ◽

Methylation Quantitative Trait Loci

ABSTRACTCharacterizing the complex relationship between genetic, epigenetic and transcriptomic variation has the potential to increase understanding about the mechanisms underpinning health and disease phenotypes. In this study, we describe the most comprehensive analysis of common genetic variation on DNA methylation (DNAm) to date, using the Illumina EPIC array to profile samples from the UK Household Longitudinal study. We identified 12,689,548 significant DNA methylation quantitative trait loci (mQTL) associations (P < 6.52x10-14) occurring between 2,907,234 genetic variants and 93,268 DNAm sites, including a large number not identified using previous DNAm-profiling methods. We demonstrate the utility of these data for interpreting the functional consequences of common genetic variation associated with > 60 human traits, using Summary data–based Mendelian Randomization (SMR) to identify 1,662 pleiotropic associations between 36 complex traits and 1,246 DNAm sites. We also use SMR to characterize the relationship between DNAm and gene expression, identifying 6,798 pleiotropic associations between 5,420 DNAm sites and the transcription of 1,702 genes. Our mQTL database and SMR results are available via a searchable online database (http://www.epigenomicslab.com/online-data-resources/) as a resource to the research community.

Download Full-text

The Contribution of RNA Decay Quantitative Trait Loci to Inter-Individual Variation in Steady-State Gene Expression Levels

PLoS Genetics ◽

10.1371/journal.pgen.1003000 ◽

2012 ◽

Vol 8 (10) ◽

pp. e1003000 ◽

Cited By ~ 68

Author(s):

Athma A. Pai ◽

Carolyn E. Cain ◽

Orna Mizrahi-Man ◽

Sherryl De Leon ◽

Noah Lewellen ◽

...

Keyword(s):

Gene Expression ◽

Steady State ◽

Quantitative Trait Loci ◽

Individual Variation ◽

Quantitative Trait ◽

Rna Decay ◽

Expression Levels ◽

Trait Loci ◽

Gene Expression Levels

Download Full-text

Characterization of expression quantitative trait loci in extensively phenotyped pedigrees ascertained for bipolar disorder

10.1101/031427 ◽

2015 ◽

Author(s):

Christine Peterson ◽

Susan Service ◽

Anna Jasinska ◽

Fuying Gao ◽

Ivette Zelaya ◽

...

Keyword(s):

Gene Expression ◽

Bipolar Disorder ◽

Quantitative Trait Loci ◽

Quantitative Trait ◽

Complex Traits ◽

Expression Quantitative Trait Loci ◽

Genome Wide ◽

Wide Range ◽

Trait Loci

The observation that variants regulating gene expression (expression quantitative trait loci, eQTL) are at a high frequency among SNPs associated with complex traits has made the genome-wide characterization of gene expression an important tool in genetic mapping studies of such traits. As part of a study to identify genetic loci contributing to bipolar disorder and a wide range of BP-related quantitative traits in members of 26 pedigrees from Costa Rica and Colombia, we measured gene expression in lymphoblastoid cell lines derived from 786 pedigree members. The study design enabled us to comprehensively reconstruct the genetic regulatory network in these families, provide estimates of heritability, identify eQTL, evaluate missing heritability for the eQTL, and quantify the number of different alleles contributing to any given locus.

Download Full-text

Integrated Genome-Wide Analysis of MicroRNA Expression Quantitative Trait Loci in Pig Longissimus Dorsi Muscle

Frontiers in Genetics ◽

10.3389/fgene.2021.644091 ◽

2021 ◽

Vol 12 ◽

Author(s):

Kaitlyn R. Daza ◽

Deborah Velez-Irizarry ◽

Sebastian Casiró ◽

Juan P. Steibel ◽

Nancy E. Raney ◽

...

Keyword(s):

Gene Expression ◽

Quantitative Trait Loci ◽

Mirna Expression ◽

Quantitative Trait ◽

Complex Traits ◽

Carcass Composition ◽

Longissimus Dorsi ◽

Expression Quantitative Trait Loci ◽

Genome Wide ◽

Trait Loci

Determining mechanisms regulating complex traits in pigs is essential to improve the production efficiency of this globally important protein source. MicroRNAs (miRNAs) are a class of non-coding RNAs known to post-transcriptionally regulate gene expression affecting numerous phenotypes, including those important to the pig industry. To facilitate a more comprehensive understanding of the regulatory mechanisms controlling growth, carcass composition, and meat quality phenotypes in pigs, we integrated miRNA and gene expression data from longissimus dorsi muscle samples with genotypic and phenotypic data from the same animals. We identified 23 miRNA expression Quantitative Trait Loci (miR-eQTL) at the genome-wide level and examined their potential effects on these important production phenotypes through miRNA target prediction, correlation, and colocalization analyses. One miR-eQTL miRNA, miR-874, has target genes that colocalize with phenotypic QTL for 12 production traits across the genome including backfat thickness, dressing percentage, muscle pH at 24 h post-mortem, and cook yield. The results of our study reveal genomic regions underlying variation in miRNA expression and identify miRNAs and genes for future validation of their regulatory effects on traits of economic importance to the global pig industry.

Download Full-text

A Haplotype-Based Algorithm for Multilocus Linkage Disequilibrium Mapping of Quantitative Trait Loci With Epistasis

Genetics ◽

10.1093/genetics/163.4.1533 ◽

2003 ◽

Vol 163 (4) ◽

pp. 1533-1548 ◽

Cited By ~ 6

Author(s):

Xiang-Yang Lou ◽

George Casella ◽

Ramon C Littell ◽

Mark C K Yang ◽

Julie A Johnson ◽

...

Keyword(s):

Parameter Estimation ◽

Linkage Disequilibrium ◽

Quantitative Trait Loci ◽

Quantitative Trait ◽

Complex Traits ◽

Association Studies ◽

Natural Populations ◽

Evolutionary Relationship ◽

Linkage Disequilibrium Mapping ◽

Trait Loci

AbstractFor tightly linked loci, cosegregation may lead to nonrandom associations between alleles in a population. Because of its evolutionary relationship with linkage, this phenomenon is called linkage disequilibrium. Today, linkage disequilibrium-based mapping has become a major focus of recent genome research into mapping complex traits. In this article, we present a new statistical method for mapping quantitative trait loci (QTL) of additive, dominant, and epistatic effects in equilibrium natural populations. Our method is based on haplotype analysis of multilocus linkage disequilibrium and exhibits two significant advantages over current disequilibrium mapping methods. First, we have derived closed-form solutions for estimating the marker-QTL haplotype frequencies within the maximum-likelihood framework implemented by the EM algorithm. The allele frequencies of putative QTL and their linkage disequilibria with the markers are estimated by solving a system of regular equations. This procedure has significantly improved the computational efficiency and the precision of parameter estimation. Second, our method can detect marker-QTL disequilibria of different orders and QTL epistatic interactions of various kinds on the basis of a multilocus analysis. This can not only enhance the precision of parameter estimation, but also make it possible to perform whole-genome association studies. We carried out extensive simulation studies to examine the robustness and statistical performance of our method. The application of the new method was validated using a case study from humans, in which we successfully detected significant QTL affecting human body heights. Finally, we discuss the implications of our method for genome projects and its extension to a broader circumstance. The computer program for the method proposed in this article is available at the webpage http://www.ifasstat.ufl.edu/genome/~LD.

Download Full-text

Expression Quantitative Trait Loci (eQTLs) Associated with Retrotransposons Demonstrate Their Modulatory Effect on the Transcriptome

International Journal of Molecular Sciences ◽

10.3390/ijms22126319 ◽

2021 ◽

Vol 22 (12) ◽

pp. 6319

Author(s):

Sulev Koks ◽

Abigail L. Pfaff ◽

Vivien J. Bubb ◽

John P. Quinn

Keyword(s):

Gene Expression ◽

Quantitative Trait Loci ◽

Quantitative Trait ◽

Complex Traits ◽

Regulatory Function ◽

Study Cohort ◽

Expression Quantitative Trait Loci ◽

Genome Wide ◽

Trait Loci ◽

Whole Transcriptome

Transposable elements (TEs) are repetitive elements that belong to a variety of functional classes and have an important role in shaping genome evolution. Around 50% of the human genome contains TEs, and they have been termed the “dark matter” of the genome because relatively little is known about their function. While TEs have been shown to participate in aberrant gene regulation and the pathogenesis of diseases, only a few studies have explored the systemic effect of TEs on gene expression. In the present study, we analysed whole genome sequences and blood whole transcriptome data from 570 individuals within the Parkinson’s Progressive Markers Initiative (PPMI) cohort to identify expression quantitative trait loci (eQTL) regulating genome-wide gene expression associated with TEs. We identified 2132 reference TEs that were polymorphic for their presence or absence in our study cohort. The presence or absence of the TE element could change the expression of the gene or gene clusters from zero to tens of thousands of copies of RNA. The main finding is that many TEs possess very strong regulatory effects, and they have the potential to modulate large genetic networks with hundreds of target genes over the genome. We illustrate the plethora of regulatory mechanisms using examples of their action at the HLA gene cluster and data showing different TEs' convergence to modulate WFS1 gene expression. In conclusion, the presence or absence of polymorphisms of TEs has an eminent genome-wide regulatory function with large effect size at the level of the whole transcriptome. The role of TEs in explaining, in part, the missing heritability for complex traits is convincing and should be considered.

Download Full-text

A statistical framework for cross-tissue transcriptome-wide association analysis

10.1101/286013 ◽

2018 ◽

Cited By ~ 4

Author(s):

Yiming Hu ◽

Mo Li ◽

Qiongshi Lu ◽

Haoyi Weng ◽

Jiawei Wang ◽

...

Keyword(s):

Gene Expression ◽

Association Analysis ◽

Complex Traits ◽

Genome Wide Association Study ◽

Late Onset ◽

Imputation Accuracy ◽

Genome Wide Association ◽

Expression Levels ◽

Genome Wide ◽

Specific Tissue

AbstractTranscriptome-wide association analysis is a powerful approach to studying the genetic architecture of complex traits. A key component of this approach is to build a model to predict (impute) gene expression levels from genotypes from samples with matched genotypes and expression levels in a specific tissue. However, it is challenging to develop robust and accurate imputation models with limited sample sizes for any single tissue. Here, we first introduce a multi-task learning approach to jointly impute gene expression in 44 human tissues. Compared with single-tissue methods, our approach achieved an average 39% improvement in imputation accuracy and generated effective imputation models for an average 120% (range 13%-339%) more genes in each tissue. We then describe a summary statistic-based testing framework that combines multiple single-tissue associations into a single powerful metric to quantify overall gene-trait association at the organism level. When our method, called UTMOST, was applied to analyze genome wide association results for 50 complex traits (Ntotal=4.5 million), we were able to identify considerably more genes in tissues enriched for trait heritability, and cross-tissue analysis significantly outperformed single-tissue strategies (p=1.7e-8). Finally, we performed a cross-tissue genome-wide association study for late-onset Alzheimer’s disease (LOAD) and replicated our findings in two independent datasets (Ntotal=175,776). In total, we identified 69 significant genes, many of which are novel, leading to novel insights on LOAD etiologies.

Download Full-text

Genetic and Nongenetic Bases for the L-Shaped Distribution of Quantitative Trait Loci Effects

Genetics ◽

10.1093/genetics/157.4.1773 ◽

2001 ◽

Vol 157 (4) ◽

pp. 1773-1787 ◽

Cited By ~ 8

Author(s):

Bruno Bost ◽

Dominique de Vienne ◽

Frédéric Hospital ◽

Laurence Moreau ◽

Christine Dillmann

Keyword(s):

Linkage Disequilibrium ◽

Quantitative Trait Loci ◽

Population Size ◽

Quantitative Trait ◽

Metabolic Flux ◽

Genetic Model ◽

Small Population ◽

Additive Genetic Model ◽

Metabolic Mechanism ◽

Qtl Effects

Abstract The L-Shaped distribution of estimated QTL effects (R2) has long been reported. We recently showed that a metabolic mechanism could account for this phenomenon. But other nonexclusive genetic or nongenetic causes may contribute to generate such a distribution. Using analysis and simulations of an additive genetic model, we show that linkage disequilibrium between QTL, low heritability, and small population size may also be involved, regardless of the gene effect distribution. In addition, a comparison of the additive and metabolic genetic models revealed that estimates of the QTL effects for traits proportional to metabolic flux are far less robust than for additive traits. However, in both models the highest R2's repeatedly correspond to the same set of QTL.

Download Full-text