An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People

Matthew R. Nelson; Daniel Wegmann; Margaret G. Ehm; Darren Kessner; Pamela St. Jean; Claudio Verzilli; Judong Shen; Zhengzheng Tang; Silviu-Alin Bacanu; Dana Fraser; Liling Warren; Jennifer Aponte; Matthew Zawistowski; Xiao Liu; Hao Zhang; Yong Zhang; Jun Li; Yun Li; Li Li; Peter Woollard; Simon Topp; Matthew D. Hall; Keith Nangle; Jun Wang; Gonçalo Abecasis; Lon R. Cardon; Sebastian Zöllner; John C. Whittaker; Stephanie L. Chissoe; John Novembre; Vincent Mooser

doi:10.1126/science.1217876

An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People

Science ◽

10.1126/science.1217876 ◽

2012 ◽

Vol 337 (6090) ◽

pp. 100-104 ◽

Cited By ~ 488

Author(s):

Matthew R. Nelson ◽

Daniel Wegmann ◽

Margaret G. Ehm ◽

Darren Kessner ◽

Pamela St. Jean ◽

...

Keyword(s):

Population Growth ◽

Drug Targets ◽

Complex Disease ◽

Target Genes ◽

Rare Variants ◽

Disease Risk ◽

Growth Parameters ◽

Purifying Selection ◽

Human Populations ◽

Functional Variants

Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases) and geographically localized, so that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. We conclude that because of rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.

Download Full-text

Analysis of chromatin organization and gene expression in T cells identifies functional genes for rheumatoid arthritis

10.1101/827923 ◽

2019 ◽

Author(s):

Jing Yang ◽

Amanda McGovern ◽

Paul Martin ◽

Kate Duffus ◽

Xiangyu Ge ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Gene Expression ◽

T Cells ◽

Complex Disease ◽

Target Genes ◽

Disease Risk ◽

Association Studies ◽

Dna Interaction ◽

Genome Wide Association Studies ◽

Causal Genes

AbstractGenome-wide association studies have identified genetic variation contributing to complex disease risk. However, assigning causal genes and mechanisms has been more challenging because disease-associated variants are often found in distal regulatory regions with cell-type specific behaviours. Here, we collect ATAC-seq, Hi-C, Capture Hi-C and nuclear RNA-seq data in stimulated CD4+ T-cells over 24 hours, to identify functional enhancers regulating gene expression. We characterise changes in DNA interaction and activity dynamics that correlate with changes gene expression, and find that the strongest correlations are observed within 200 kb of promoters. Using rheumatoid arthritis as an example of T-cell mediated disease, we demonstrate interactions of expression quantitative trait loci with target genes, and confirm assigned genes or show complex interactions for 20% of disease associated loci, including FOXO1, which we confirm using CRISPR/Cas9.

Download Full-text

Ultra-rare variants drive substantial cis-heritability of human gene expression

10.1101/219238 ◽

2017 ◽

Cited By ~ 7

Author(s):

Ryan D. Hernandez ◽

Lawrence H. Uricchio ◽

Kevin Hartman ◽

Chun Ye ◽

Andrew Dahl ◽

...

Keyword(s):

Complex Disease ◽

Rare Variants ◽

Human Gene ◽

Purifying Selection ◽

Sequencing Data ◽

Inference Procedure ◽

Mendelian Diseases ◽

Complex Phenotypes ◽

Regulatory Architecture ◽

Human Genes

ABSTRACTThe vast majority of human mutations have minor allele frequencies (MAF) under 1%, with the plurality observed only once (i.e., “singletons”). While Mendelian diseases are predominantly caused by rare alleles, their cumulative contribution to complex phenotypes remains largely unknown. We develop and rigorously validate an approach to jointly estimate the contribution of all alleles, including singletons, to phenotypic variation. We apply our approach to transcriptional regulation, an intermediate between genetic variation and complex disease. Using whole genome DNA and lymphoblastoid cell line RNA sequencing data from 360 European individuals, we conservatively estimate that singletons contribute ~25% of cis-heritability across genes (dwarfing the contributions of other frequencies). Strikingly, the majority (~76%) of singleton heritability derives from ultra-rare variants absent from thousands of additional samples. We develop a novel inference procedure to demonstrate that our results are consistent with rampant purifying selection shaping the regulatory architecture of most human genes.

Download Full-text

Genetic Analyses of Blood Cell Structure for Biological and Pharmacological Inference

10.1101/2020.01.30.927483 ◽

2020 ◽

Author(s):

Parsa Akbari ◽

Dragana Vuckovic ◽

Tao Jiang ◽

Kousik Kundu ◽

Roman Kreuzhuber ◽

...

Keyword(s):

Flow Cytometry ◽

Drug Targets ◽

Complex Disease ◽

Cell Function ◽

Disease Risk ◽

Secretory Granules ◽

Cell Structure ◽

Cell Types ◽

Nucleic Acid Content ◽

Genetic Associations

SUMMARYThousands of genetic associations with phenotypes of blood cells are known, but few are with phenotypes relevant to cell function. We performed GWAS of 63 flow-cytometry phenotypes, including measures of cell granularity, nucleic acid content, and reactivity, in 39,656 participants in the INTERVAL study, identifying 2,172 variant-trait associations. These include associations mediated by functional cellular structures such as secretory granules, implicated in vascular, thrombotic, inflammatory and neoplastic diseases. By integrating our results with epigenetic data and with signals from molecular abundance/disease GWAS, we infer the hematopoietic origins of population phenotypic variation and identify the transcription factor FOG2 as a regulator of platelet α-granularity. We show how flow cytometry genetics can suggest cell types mediating complex disease risk and suggest efficacious drug targets, presenting Daclizumab/Vedolizumab in autoimmune disease as positive controls. Finally, we add to existing evidence supporting IL7/IL7-R as drug targets for multiple sclerosis.

Download Full-text

TVAR: Assessing Tissue-specific Functional Effects of Non-coding Variants with Deep Learning

10.21203/rs.3.rs-113771/v1 ◽

2020 ◽

Author(s):

Hai Yang ◽

Rui Chen ◽

Quan Wang ◽

Qiang Wei ◽

Ying Ji ◽

...

Keyword(s):

Rare Variants ◽

Disease Risk ◽

Superior Performance ◽

Cancer Type ◽

Tissue Specific ◽

Functional Variants ◽

Cell Type Specific ◽

Artery Disease ◽

Coding Variants

Abstract Analysis of whole genome-sequencing (WGS) for genetics of disease is still a challenge due to lack of accurate functional annotation of noncoding variants, especially the rare ones. As eQTLs have been extensively implicated in genetics of human diseases, we hypothesize that noncoding rare variants discovered in WGS play a regulatory role in predisposing disease risk. With thousands of tissue- and cell type-specific epigenomic features, we propose TVAR, a multi-label learning based deep neural network that predicts the functionality of noncoding variants in the genome based on eQTLs across 49 human tissues in GTEx. TVAR learns the relationships between high-dimensional epigenomics and eQTLs across tissues, taking the correlation among tissues into account to learn shared and tissue-specific eQTL effects. As a result, TVAR outputs tissue-specific annotations, with an average of 0.77 across these tissues. We evaluate TVAR’s performance on four complex diseases (coronary artery disease, breast cancer, Type 2 diabetes, and Schizophrenia), using TVAR’s tissue-specific annotations, and observe its superior performance in predicting functional variants for both common and rare variants, compared to five existing state-of-the-art tools. We further evaluate TVAR’s G-score, a scoring scheme across all tissues, on ClinVar, fine-mapped GWAS loci, Massive Parallel Reporter Assay (MPRA) validated variants, and observe consistently better performance of TVAR compared to other competing tools.

Download Full-text

Calibrated rare variant genetic risk scores for complex disease prediction using large exome sequence repositories

10.1101/2020.02.03.931519 ◽

2020 ◽

Author(s):

Ricky Lali ◽

Michael Chong ◽

Arghavan Omidi ◽

Pedrum Mohammadi-Shemirani ◽

Ann Le ◽

...

Keyword(s):

Rare Variant ◽

Genetic Risk ◽

Complex Disease ◽

Rare Variants ◽

Disease Risk ◽

Genetic Risk Score ◽

Risk Scores ◽

Considerable Proportion ◽

Asian Populations ◽

Artery Disease

ABSTRACTRare variants are collectively numerous and may underlie a considerable proportion of complex disease risk. However, identifying genuine rare variant associations is challenging due to small effect sizes, presence of technical artefacts, and heterogeneity in population structure. We hypothesized that rare variant burden over a large number of genes can be combined into predictive rare variant genetic risk score (RVGRS). We propose a novel method (RV-EXCALIBER) that leverages summary-level data from a large public exome sequencing database (gnomAD) as controls and robustly calibrates rare variant burden to account for the aforementioned biases. A RVGRS was found to strongly associate with coronary artery disease (CAD) in European and South Asian populations. Calibrated RVGRS capture the aggregate effect of rare variants through a polygenic model of inheritance, identifies 1.5% of the population with substantial risk of early CAD, and confers risk even when adjusting for known Mendelian CAD genes, clinical risk factors, and common variant gene scores.

Download Full-text

The impact of rare variation on gene expression across tissues

10.1101/074443 ◽

2016 ◽

Cited By ~ 10

Author(s):

Xin Li ◽

Yungil Kim ◽

Emily K. Tsang ◽

Joe R. Davis ◽

Farhan N. Damani ◽

...

Keyword(s):

Gene Expression ◽

Rare Variants ◽

Disease Risk ◽

Whole Genome Sequencing Data ◽

Personal Genome ◽

Sequencing Data ◽

Potential Health ◽

Rare Variation ◽

Functional Variants ◽

The Impact

AbstractRare genetic variants are abundant in humans yet their functional effects are often unknown and challenging to predict. The Genotype-Tissue Expression (GTEx) project provides a unique opportunity to identify the functional impact of rare variants through combined analyses of whole genomes and multi-tissue RNA-sequencing data. Here, we identify gene expression outliers, or individuals with extreme expression levels, across 44 human tissues, and characterize the contribution of rare variation to these large changes in expression. We find 58% of underexpression and 28% of overexpression outliers have underlying rare variants compared with 9% of non-outliers. Large expression effects are enriched for proximal loss-of-function, splicing, and structural variants, particularly variants near the TSS and at evolutionarily conserved sites. Known disease genes have expression outliers, underscoring that rare variants can contribute to genetic disease risk. To prioritize functional rare regulatory variants, we develop RIVER, a Bayesian approach that integrates RNA and whole genome sequencing data from the same individual. RIVER predicts functional variants significantly better than models using genomic annotations alone, and is an extensible tool for personal genome interpretation. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues with potential health consequences, and provide an integrative method for interpreting rare variants in individual genomes.

Download Full-text

Dissecting molecular regulatory mechanisms underlying noncoding susceptibility SNPs associated with 19 autoimmune diseases using multi-omics integrative analysis

10.1101/871384 ◽

2019 ◽

Author(s):

Xiao-Feng Chen ◽

Min-Rui Guo ◽

Yuan-Yuan Duan ◽

Feng Jiang ◽

Hao Wu ◽

...

Keyword(s):

Autoimmune Diseases ◽

Long Range ◽

Drug Targets ◽

Molecular Mechanisms ◽

Target Genes ◽

Immune Cell ◽

Drug Application ◽

Chromatin Interaction ◽

Genetic Associations ◽

Functional Variants

AbstractThe genome-wide association studies (GWAS) have identified hundreds of susceptibility loci associated with autoimmune diseases. However, over 90% of risk variants are located in the noncoding regions, leading to great challenges in deciphering the underlying causal functional variants/genes and biological mechanisms. Previous studies focused on developing new scoring method to prioritize functional/disease-relevant variants. However, they principally incorporated annotation data across all cells/tissues while omitted the cell-specific or context-specific regulation. Moreover, limited analyses were performed to dissect the detailed molecular regulatory circuits linking functional GWAS variants to disease etiology. Here we devised a new analysis frame that incorporate hundreds of immune cell-specific multi-omics data to prioritize functional noncoding susceptibility SNPs with gene targets and further dissect their downstream molecular mechanisms and clinical applications for 19 autoimmune diseases. Most prioritized SNPs have genetic associations with transcription factors (TFs) binding, histone modification or chromatin accessibility, indicating their allelic regulatory roles on target genes. Their target genes were significantly enriched in immunologically related pathways and other immunologically related functions. We also detected long-range regulation on 90.7% of target genes including 132 ones exclusively regulated by distal SNPs (eg, CD28, IL2RA), which involves several potential key TFs (eg, CTCF), suggesting the important roles of long-range chromatin interaction in autoimmune diseases. Moreover, we identified hundreds of known or predicted druggable genes, and predicted some new potential drug targets for several autoimmune diseases, including two genes (NFKB1, SH2B3) with known drug indications on other diseases, highlighting their potential drug repurposing opportunities. In summary, our analyses may provide unique resource for future functional follow-up and drug application on autoimmune diseases, which are freely available at http://fngwas.online/.Author SummaryAutoimmune diseases are groups of complex immune system disorders with high prevalence rates and high heritabilities. Previous studies have unraveled thousands of SNPs associated with different autoimmune diseases. However, it remains largely unknown on the molecular mechanisms underlying these genetic associations. Striking, over 90% of risk SNPs are located in the noncoding region. By leveraging multiple immune cell-specific multi-omics data across genomic, epigenetic, transcriptomic and 3D chromatin interaction information, we systematically analyzed the functional variants/genes and biological mechanisms underlying genetic association on 19 autoimmune diseases. We found that most functional SNPs may affect target gene expression through altering transcription factors (TFs) binding, histone modification or chromatin accessibility. Most target genes had known immunological functions. We detected prevailing long-range chromatin interaction linking distal functional SNPs to target genes. We also identified many known drug targets and predicted some new drug target genes for several autoimmune diseases, suggesting their potential clinical applications. All analysis results and tools are available online, which may provide unique resource for future functional follow-up and drug application. Our study may help reduce the gap between traditional genetic findings and biological mechanistically exploration of disease etiologies as well as clinical drug development.

Download Full-text

Searching the Dark Genome for Alzheimer’s Disease Risk Variants

Brain Sciences ◽

10.3390/brainsci11030332 ◽

2021 ◽

Vol 11 (3) ◽

pp. 332

Author(s):

Rachel Raybould ◽

Rebecca Sims

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Drug Targets ◽

Complex Disease ◽

Clinical Symptoms ◽

Disease Risk ◽

Protein Coding ◽

Risk Variants ◽

Review Current ◽

Novel Protein

Sporadic Alzheimer’s disease (AD) is a complex genetic disease, and the leading cause of dementia worldwide. Over the past 3 decades, extensive pioneering research has discovered more than 70 common and rare genetic risk variants. These discoveries have contributed massively to our understanding of the pathogenesis of AD but approximately half of the heritability for AD remains unaccounted for. There are regions of the genome that are not assayed by mainstream genotype and sequencing technology. These regions, known as the Dark Genome, often harbour large structural DNA variants that are likely relevant to disease risk. Here, we describe the dark genome and review current technological and bioinformatics advances that will enable researchers to shed light on these hidden regions of the genome. We highlight the potential importance of the hidden genome in complex disease and how these strategies will assist in identifying the missing heritability of AD. Identification of novel protein-coding structural variation that increases risk of AD will open new avenues for translational research and new drug targets that have the potential for clinical benefit to delay or even prevent clinical symptoms of disease.

Download Full-text

Relaxed selection during a recent human expansion

10.1101/064691 ◽

2016 ◽

Cited By ~ 4

Author(s):

S. Peischl ◽

I. Dupanloup ◽

A. Foucal ◽

M. Jomphe ◽

V. Bruat ◽

...

Keyword(s):

Wave Front ◽

Rare Variants ◽

Genetic Diseases ◽

Low Frequency ◽

Purifying Selection ◽

Genomic Diversity ◽

Human Populations ◽

Deleterious Mutations ◽

Relaxed Selection ◽

French Canadians

AbstractHumans have colonized the planet through a series of range expansions, which deeply impacted genetic diversity in newly settled areas and potentially increased the frequency of deleterious mutations on expanding wave fronts. To test this prediction, we studied the genomic diversity of French Canadians who colonized Quebec in the 17th century. We used historical information and records from ∼4000 ascending genealogies to select individuals whose ancestors lived mostly on the colonizing wave front and individuals whose ancestors remained in the core of the settlement. Comparison of exomic diversity reveals that i) both new and low frequency variants are significantly more deleterious in front than in core individuals, ii) equally deleterious mutations are at higher frequencies in front individuals, and iii) front individuals are two times more likely to be homozygous for rare very deleterious mutations present in Europeans. These differences have emerged in the past 6-9 generations and cannot be explained by differential inbreeding, but are consistent with relaxed selection on the wave front. Modeling the evolution of rare variants allowed us to estimate their associated selection coefficients as well as front and core effective sizes. Even though range expansions had a limited impact on the overall fitness of French Canadians, they could explain the higher prevalence of recessive genetic diseases in recently settled regions. Since we show that modern human populations are experiencing differential strength of purifying selection, similar processes might have happened throughout human history, contributing to a higher mutation load in populations that have undergone spatial expansions.

Download Full-text

Genetic landscapes reveal how human genetic diversity aligns with geography

10.1101/233486 ◽

2017 ◽

Cited By ~ 3

Author(s):

Benjamin Marco Peter ◽

Desislava Petkova ◽

John Novembre

Keyword(s):

Genetic Diversity ◽

Population Structure ◽

Genetic Differentiation ◽

Rare Variants ◽

Disease Risk ◽

Geographic Distance ◽

Population History ◽

Human Populations ◽

Complex Processes ◽

Scale Population

Geographic patterns in human genetic diversity carry footprints of population history1,2 and provide insights for genetic medicine and its application across human populations3,4. Summarizing and visually representing these patterns of diversity has been a persistent goal for human geneticists5–10, and has revealed that genetic differentiation is frequently correlated with geographic distance. However, most analytical methods to represent population structure11–15 do not incorporate geography directly, and it must be considered post hoc alongside a visual summary. Here, we use a recently developed spatially explicit method to estimate “effective migration” surfaces to visualize how human genetic diversity is geographically structured (the EEMS method16). The resulting surfaces are “rugged”, which indicates the relationship between genetic and geographic distance is heterogenous and distorted as a rule. Most prominently, topographic and marine features regularly align with increased genetic differentiation (e.g. the Sahara desert, Mediterranean Sea or Himalaya at large scales; the Adriatic, interisland straits in near Oceania at smaller scales). In other cases, the locations of historical migrations and boundaries of language families align with migration features. These results provide visualizations of human genetic diversity that reveal local patterns of differentiation in detail and emphasize that while genetic similarity generally decays with geographic distance, there have regularly been factors that subtly distort the underlying relationship across space observed today. The fine-scale population structure depicted here is relevant to understanding complex processes of human population history and may provide insights for geographic patterning in rare variants and heritable disease risk.

Download Full-text