scholarly journals ncVarDB: a manually curated database for pathogenic non-coding variants and benign controls

Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Harry Biggs ◽  
Padmini Parthasarathy ◽  
Alexandra Gavryushkina ◽  
Paul P Gardner

Abstract Variants within the non-coding genome are frequently associated with phenotypes in genome-wide association studies. These non-coding regions may be involved in the regulation of gene expression, encode functional non-coding RNAs, or influence splicing and other cellular functions. We have curated a list of characterized non-coding human genome variants based on the published evidence that indicates phenotypic consequences of the variation. In order to minimize annotation errors, two curators have independently verified the supporting evidence for pathogenicity of each non-coding variant in the published literature. The database consists of 721 non-coding variants linked to the published literature describing the evidence of functional consequences. We have also sampled 7228 covariate-matched benign controls, that have a population frequency of over 5%, from the single nucleotide polymorphism database (dbSNP151) database. These were sampled controlling for potential confounding factors such as linkage with pathogenic variants, annotation type (untranslated region, intron, intergenic, etc.) and variant type (substitution or indel). The dataset presented here represents a curated repository, with a potential use for the training or evaluation of algorithms used in the prediction of non-coding variant functionality. Database URL: https://github.com/Gardner-BinfLab/ncVarDB.

2016 ◽  
Author(s):  
Valentina Iotchkova ◽  
Graham R.S. Ritchie ◽  
Matthias Geihs ◽  
Sandro Morganella ◽  
Josine L. Min ◽  
...  

Loci discovered by genome-wide association studies (GWAS) predominantly map outside protein-coding genes. The interpretation of functional consequences of non-coding variants can be greatly enhanced by catalogs of regulatory genomic regions in cell lines and primary tissues. However, robust and readily applicable methods are still lacking to systematically evaluate the contribution of these regions to genetic variation implicated in diseases or quantitative traits. Here we propose a novel approach that leverages GWAS findings with regulatory or functional annotations to classify features relevant to a phenotype of interest. Within our framework, we account for major sources of confounding that current methods do not offer. We further assess enrichment statistics for 27 GWAS traits within regulatory regions from the ENCODE and Roadmap projects. We characterise unique enrichment patterns for traits and annotations, driving novel biological insights. The method is implemented in standalone software and R package to facilitate its application by the research community.


2021 ◽  
pp. 1-10
Author(s):  
Sophie E. Legge ◽  
Marcos L. Santoro ◽  
Sathish Periyasamy ◽  
Adeniran Okewole ◽  
Arsalan Arsalan ◽  
...  

Abstract Schizophrenia is a severe psychiatric disorder with high heritability. Consortia efforts and technological advancements have led to a substantial increase in knowledge of the genetic architecture of schizophrenia over the past decade. In this article, we provide an overview of the current understanding of the genetics of schizophrenia, outline remaining challenges, and summarise future directions of research. World-wide collaborations have resulted in genome-wide association studies (GWAS) in over 56 000 schizophrenia cases and 78 000 controls, which identified 176 distinct genetic loci. The latest GWAS from the Psychiatric Genetics Consortium, available as a pre-print, indicates that 270 distinct common genetic loci have now been associated with schizophrenia. Polygenic risk scores can currently explain around 7.7% of the variance in schizophrenia case-control status. Rare variant studies have implicated eight rare copy-number variants, and an increased burden of loss-of-function variants in SETD1A, as increasing the risk of schizophrenia. The latest exome sequencing study, available as a pre-print, implicates a burden of rare coding variants in a further nine genes. Gene-set analyses have demonstrated significant enrichment of both common and rare genetic variants associated with schizophrenia in synaptic pathways. To address current challenges, future genetic studies of schizophrenia need increased sample sizes from more diverse populations. Continued expansion of international collaboration will likely identify new genetic regions, improve fine-mapping to identify causal variants, and increase our understanding of the biology and mechanisms of schizophrenia.


2021 ◽  
Author(s):  
Abhishek Nag ◽  
Lawrence Middleton ◽  
Ryan S Dhindsa ◽  
Dimitrios Vitsios ◽  
Eleanor M Wigmore ◽  
...  

Genome-wide association studies have established the contribution of common and low frequency variants to metabolic biomarkers in the UK Biobank (UKB); however, the role of rare variants remains to be assessed systematically. We evaluated rare coding variants for 198 metabolic biomarkers, including metabolites assayed by Nightingale Health, using exome sequencing in participants from four genetically diverse ancestries in the UKB (N=412,394). Gene-level collapsing analysis, that evaluated a range of genetic architectures, identified a total of 1,303 significant relationships between genes and metabolic biomarkers (p<1x10-8), encompassing 207 distinct genes. These include associations between rare non-synonymous variants in GIGYF1 and glucose and lipid biomarkers, SYT7 and creatinine, and others, which may provide insights into novel disease biology. Comparing to a previous microarray-based genotyping study in the same cohort, we observed that 40% of gene-biomarker relationships identified in the collapsing analysis were novel. Finally, we applied Gene-SCOUT, a novel tool that utilises the gene-biomarker association statistics from the collapsing analysis to identify genes having similar biomarker fingerprints and thus expand our understanding of gene networks.


2014 ◽  
Author(s):  
Daniel S Himmelstein ◽  
Sergio E Baranzini

The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important derivations will be the translation of this information into a multiscale understanding of pathogenic variants, and leveraging existing data to increase the power of existing and future studies through prioritization. We explore edge prediction on heterogeneous networks—graphs with multiple node and edge types—for accomplishing both tasks. First we constructed a network with 18 node types—genes, diseases, tissues, pathophysiologies, and 14 MSigDB (molecular signatures database)collections—and 19 edge types from high-throughput publicly-available resources. From this network composed of 40,343 nodes and 1,608,168 edges, we extracted features that describe the topology between specific genes and diseases. Next, we trained a model from GWAS associations and predicted the probability of association between each protein-coding gene and each of 29 well-studied complex diseases. The model, which achieved 132-fold enrichment in precision at 10% recall, outperformed any individual domain, highlighting the benefit of integrative approaches. We identified pleiotropy, transcriptional signatures of perturbations, pathways, and protein interactions as fundamental mechanisms explaining pathogenesis. Our method successfully predicted the results (with AUROC = 0.79) from a withheld multiple sclerosis (MS) GWAS despite starting with only 13 previously associated genes. Finally, we combined our network predictions with statistical evidence of association to propose four novel MS genes, three of which (JAK2, REL, RUNX3) validated on the masked GWAS. Furthermore, our predictions provide biological support highlighting REL as the causal gene within its gene-rich locus. Users can browse all predictions online (http://het.io). Heterogeneous network edge prediction effectively prioritized genetic associations and provides a powerful new approach for data integration across multiple domains.


2021 ◽  
pp. JCO.20.01992
Author(s):  
Chi Gao ◽  
Eric C. Polley ◽  
Steven N. Hart ◽  
Hongyan Huang ◽  
Chunling Hu ◽  
...  

PURPOSE This study assessed the joint association of pathogenic variants (PVs) in breast cancer (BC) predisposition genes and polygenic risk scores (PRS) with BC in the general population. METHODS A total of 26,798 non-Hispanic white BC cases and 26,127 controls from predominately population-based studies in the Cancer Risk Estimates Related to Susceptibility consortium were evaluated for PVs in BRCA1, BRCA2, ATM, CHEK2, PALB2, BARD1, BRIP1, CDH1, and NF1. PRS based on 105 common variants were created using effect estimates from BC genome-wide association studies; the performance of an overall BC PRS and estrogen receptor–specific PRS were evaluated. The odds of BC based on the PVs and PRS were estimated using penalized logistic regression. The results were combined with age-specific incidence rates to estimate 5-year and lifetime absolute risks of BC across percentiles of PRS by PV status and first-degree family history of BC. RESULTS The estimated lifetime risks of BC among general-population noncarriers, based on 10th and 90th percentiles of PRS, were 9.1%-23.9% and 6.7%-18.2% for women with or without first-degree relatives with BC, respectively. Taking PRS into account, more than 95% of BRCA1, BRCA2, and PALB2 carriers had > 20% lifetime risks of BC, whereas, respectively, 52.5% and 69.7% of ATM and CHEK2 carriers without first-degree relatives with BC, and 78.8% and 89.9% of those with a first-degree relative with BC had > 20% risk. CONCLUSION PRS facilitates personalization of BC risk among carriers of PVs in predisposition genes. Incorporating PRS into BC risk estimation may help identify > 30% of CHEK2 and nearly half of ATM carriers below the 20% lifetime risk threshold, suggesting the addition of PRS may prevent overscreening and enable more personalized risk management approaches.


2020 ◽  
Vol 2 (4) ◽  
Author(s):  
Gerard A Bouland ◽  
Joline W J Beulens ◽  
Joey Nap ◽  
Arno R van der Slik ◽  
Arnaud Zaldumbide ◽  
...  

Abstract Numerous large genome-wide association studies have been performed to understand the influence of genetics on traits. Many identified risk loci are in non-coding and intergenic regions, which complicates understanding how genes and their downstream pathways are influenced. An integrative data approach is required to understand the mechanism and consequences of identified risk loci. Here, we developed the R-package CONQUER. Data for SNPs of interest are acquired from static- and dynamic repositories (build GRCh38/hg38), including GTExPortal, Epigenomics Project, 4D genome database and genome browsers. All visualizations are fully interactive so that the user can immediately access the underlying data. CONQUER is a user-friendly tool to perform an integrative approach on multiple SNPs where risk loci are not seen as individual risk factors but rather as a network of risk factors.


2020 ◽  
Vol 12 (563) ◽  
pp. eaaz2541
Author(s):  
Leah K. Cuddy ◽  
Dmitry Prokopenko ◽  
Eric P. Cunningham ◽  
Ross Brimberry ◽  
Peter Song ◽  
...  

Recent genome-wide association studies identified the angiotensin-converting enzyme gene (ACE) as an Alzheimer’s disease (AD) risk locus. However, the pathogenic mechanism by which ACE causes AD is unknown. Using whole-genome sequencing, we identified rare ACE coding variants in AD families and investigated one, ACE1 R1279Q, in knockin (KI) mice. Similar to AD, ACE1 was increased in neurons, but not microglia or astrocytes, of KI brains, which became elevated further with age. Angiotensin II (angII) and angII receptor AT1R signaling were also increased in KI brains. Autosomal dominant neurodegeneration and neuroinflammation occurred with aging in KI hippocampus, which were absent in the cortex and cerebellum. Female KI mice exhibited greater hippocampal electroencephalograph disruption and memory impairment compared to males. ACE variant effects were more pronounced in female KI mice, suggesting a mechanism for higher AD risk in women. Hippocampal neurodegeneration was completely rescued by treatment with brain-penetrant drugs that inhibit ACE1 and AT1R. Although ACE variant-induced neurodegeneration did not depend on β-amyloid (Aβ) pathology, amyloidosis in 5XFAD mice crossed to KI mice accelerated neurodegeneration and neuroinflammation, whereas Aβ deposition was unchanged. KI mice had normal blood pressure and cerebrovascular functions. Our findings strongly suggest that increased ACE1/angII signaling causes aging-dependent, Aβ-accelerated selective hippocampal neuron vulnerability and female susceptibility, hallmarks of AD that have hitherto been enigmatic. We conclude that repurposed brain-penetrant ACE inhibitors and AT1R blockers may protect against AD.


2017 ◽  
Vol 242 (13) ◽  
pp. 1325-1334 ◽  
Author(s):  
Yizhou Zhu ◽  
Cagdas Tazearslan ◽  
Yousin Suh

Genome-wide association studies have shown that the far majority of disease-associated variants reside in the non-coding regions of the genome, suggesting that gene regulatory changes contribute to disease risk. To identify truly causal non-coding variants and their affected target genes remains challenging but is a critical step to translate the genetic associations to molecular mechanisms and ultimately clinical applications. Here we review genomic/epigenomic resources and in silico tools that can be used to identify causal non-coding variants and experimental strategies to validate their functionalities. Impact statement Most signals from genome-wide association studies (GWASs) map to the non-coding genome, and functional interpretation of these associations remained challenging. We reviewed recent progress in methodologies of studying the non-coding genome and argued that no single approach allows one to effectively identify the causal regulatory variants from GWAS results. By illustrating the advantages and limitations of each method, our review potentially provided a guideline for taking a combinatorial approach to accurately predict, prioritize, and eventually experimentally validate the causal variants.


Sign in / Sign up

Export Citation Format

Share Document