scholarly journals Assessing the performance of genome-wide association studies for predicting disease risk

2019 ◽  
Author(s):  
Jonas Patron ◽  
Arnau Serra-Cayuela ◽  
Beomsoo Han ◽  
Carin Li ◽  
David Scott Wishart

AbstractTo date more than 3700 genome-wide association studies (GWAS) have been published that look at the genetic contributions of single nucleotide polymorphisms (SNPs) to human conditions or human phenotypes. Through these studies many highly significant SNPs have been identified for hundreds of diseases or medical conditions. However, the extent to which GWAS-identified SNPs or combinations of SNP biomarkers can predict disease risk is not well known. One of the most commonly used approaches to assess the performance of predictive biomarkers is to determine the area under the receiver-operator characteristic curve (AUROC). We have developed an R package called G-WIZ to generate ROC curves and calculate the AUROC using summary-level GWAS data. We first tested the performance of G-WIZ by using AUROC values derived from patient-level SNP data, as well as literature-reported AUROC values. We found that G-WIZ predicts the AUROC with <3% error. Next, we used the summary level GWAS data from GWAS Central to determine the ROC curves and AUROC values for 569 different GWA studies spanning 219 different conditions. Using these data we found a small number of GWA studies with SNP-derived risk predictors that have very high AUROCs (>0.75). On the other hand, the average GWA study produces a multi-SNP risk predictor with an AUROC of 0.55. Detailed AUROC comparisons indicate that most SNP-derived risk predictions are not as good as clinically based disease risk predictors. All our calculations (ROC curves, AUROCs, explained heritability) are in a publicly accessible database called GWAS-ROCS (http://gwasrocs.ca). The G-WIZ code is freely available for download at https://github.com/jonaspatronjp/GWIZ-Rscript/.

Genes ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 97
Author(s):  
Rosella Mechelli ◽  
Renato Umeton ◽  
Grazia Manfrè ◽  
Silvia Romano ◽  
Maria Chiara Buscarinu ◽  
...  

Genome-wide association studies have identified more than 200 multiple sclerosis (MS)-associated loci across the human genome over the last decade, suggesting complexity in the disease etiology. This complexity poses at least two challenges: the definition of an etiological model including the impact of nongenetic factors, and the clinical translation of genomic data that may be drivers for new druggable targets. We reviewed studies dealing with single genes of interest, to understand how MS-associated single nucleotide polymorphism (SNP) variants affect the expression and the function of those genes. We then surveyed studies on the bioinformatic reworking of genome-wide association studies (GWAS) data, with aggregate analyses of many GWAS loci, each contributing with a small effect to the overall disease predisposition. These investigations uncovered new information, especially when combined with nongenetic factors having possible roles in the disease etiology. In this context, the interactome approach, defined as “modules of genes whose products are known to physically interact with environmental or human factors with plausible relevance for MS pathogenesis”, will be reported in detail. For a future perspective, a polygenic risk score, defined as a cumulative risk derived from aggregating the contributions of many DNA variants associated with a complex trait, may be integrated with data on environmental factors affecting the disease risk or protection.


2020 ◽  
Author(s):  
Lethukuthula L Nkambule

AbstractSummaryAlthough there is an exponential increase and extensive availability of genome-wide association studies data, the visualization of this data remains difficult for non-specialist users. Current software and packages for visualizing GWAS data are intended for specialists and have been developed to accomplish specific functions, favouring functionality over user experience. To facilitate this, we have developed an R shiny web application, gwaRs, that allows any general user to visualize GWAS data efficiently and effortlessly. The gwaRs web-browser interface allows users to visualize GWAS data using SNP-density, quantile-quantile, Manhattan, and Principal Component Analysis plots.AvailabilityThe gwaRs web application is publicly hosted at https://gwasviz.shinyapps.io/gwaRs/ and R source code is released under the GNU General Public License and freely available at GitHub: https://github.com/LindoNkambule/[email protected]


Author(s):  
Huaqing Zhao ◽  
Nandita Mitra ◽  
Peter A. Kanetsky ◽  
Katherine L. Nathanson ◽  
Timothy R. Rebbeck

Abstract Genome-wide association studies (GWAS) are susceptible to bias due to population stratification (PS). The most widely used method to correct bias due to PS is principal components (PCs) analysis (PCA), but there is no objective method to guide which PCs to include as covariates. Often, the ten PCs with the highest eigenvalues are included to adjust for PS. This selection is arbitrary, and patterns of local linkage disequilibrium may affect PCA corrections. To address these limitations, we estimate genomic propensity scores based on all statistically significant PCs selected by the Tracy-Widom (TW) statistic. We compare a principal components and propensity scores (PCAPS) approach to PCA and EMMAX using simulated GWAS data under no, moderate, and severe PS. PCAPS reduced spurious genetic associations regardless of the degree of PS, resulting in odds ratio (OR) estimates closer to the true OR. We illustrate our PCAPS method using GWAS data from a study of testicular germ cell tumors. PCAPS provided a more conservative adjustment than PCA. Advantages of the PCAPS approach include reduction of bias compared to PCA, consistent selection of propensity scores to adjust for PS, the potential ability to handle outliers, and ease of implementation using existing software packages.


2010 ◽  
Vol 28 (1) ◽  
pp. E2 ◽  
Author(s):  
Matthew C. Cowperthwaite ◽  
Deepankar Mohanty ◽  
Mark G. Burnett

As their power and utility increase, genome-wide association (GWA) studies are poised to become an important element of the neurosurgeon's toolkit for diagnosing and treating disease. In this paper, the authors review recent findings and discuss issues associated with gathering and analyzing GWA data for the study of neurological diseases and disorders, including those of neurosurgical importance. Their goal is to provide neurosurgeons and other clinicians with a better understanding of the practical and theoretical issues associated with this line of research. A modern GWA study involves testing hundreds of thousands of genetic markers across an entire genome, often in thousands of individuals, for any significant association with a particular disease. The number of markers assayed in a study presents several practical and theoretical issues that must be considered when planning the study. Genome-wide association studies show great promise in our understanding of the genes underlying common neurological diseases and disorders, as well as in leading to a new generation of genetic tests for clinicians.


2017 ◽  
Vol 242 (13) ◽  
pp. 1325-1334 ◽  
Author(s):  
Yizhou Zhu ◽  
Cagdas Tazearslan ◽  
Yousin Suh

Genome-wide association studies have shown that the far majority of disease-associated variants reside in the non-coding regions of the genome, suggesting that gene regulatory changes contribute to disease risk. To identify truly causal non-coding variants and their affected target genes remains challenging but is a critical step to translate the genetic associations to molecular mechanisms and ultimately clinical applications. Here we review genomic/epigenomic resources and in silico tools that can be used to identify causal non-coding variants and experimental strategies to validate their functionalities. Impact statement Most signals from genome-wide association studies (GWASs) map to the non-coding genome, and functional interpretation of these associations remained challenging. We reviewed recent progress in methodologies of studying the non-coding genome and argued that no single approach allows one to effectively identify the causal regulatory variants from GWAS results. By illustrating the advantages and limitations of each method, our review potentially provided a guideline for taking a combinatorial approach to accurately predict, prioritize, and eventually experimentally validate the causal variants.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3618 ◽  
Author(s):  
Rana Dajani ◽  
Jin Li ◽  
Zhi Wei ◽  
Michael E. March ◽  
Qianghua Xia ◽  
...  

The prevalence of Type II Diabetes (T2D) has been increasing and has become a disease of significant public health burden in Jordan. None of the previous genome-wide association studies (GWAS) have specifically investigated the Middle East populations. The Circassian and Chechen communities in Jordan represent unique populations that are genetically distinct from the Arab population and other populations in the Caucasus. Prevalence of T2D is very high in both the Circassian and Chechen communities in Jordan despite low obesity prevalence. We conducted GWAS on T2D in these two populations and further performed meta-analysis of the results. We identified a novel T2D locus at chr20p12.2 at genome-wide significance (rs6134031, P = 1.12 × 10−8) and we replicated the results in the Wellcome Trust Case Control Consortium (WTCCC) dataset. Another locus at chr12q24.31 is associated with T2D at suggestive significance level (top SNP rs4758690, P = 4.20 × 10−5) and it is a robust eQTL for the gene, MLXIP (P = 1.10 × 10−14), and is significantly associated with methylation level in MLXIP, the functions of which involves cellular glucose response. Therefore, in this first GWAS of T2D in Jordan subpopulations, we identified novel and unique susceptibility loci which may help inform the genetic underpinnings of T2D in other populations.


2018 ◽  
Author(s):  
John A Lees ◽  
Marco Galardini ◽  
Stephen D Bentley ◽  
Jeffrey N Weiser ◽  
Jukka Corander

AbstractSummaryGenome-wide association studies (GWAS) in microbes face different challenges to eukaryotes and have been addressed by a number of different methods. pyseer brings these techniques together in one package tailored to microbial GWAS, allows greater flexibility of the input data used, and adds new methods to interpret the association results.Availability and Implementationpyseer is written in python and is freely available at https://github.com/mgalardini/pyseer, or can be installed through pip. Documentation and a tutorial are available at http://[email protected] and [email protected] informationSupplementary data are available online.


2018 ◽  
Author(s):  
Jianan Zhana ◽  
Jessica van Setten ◽  
Jennifer Brody ◽  
Brenton Swenson ◽  
Anne M. Butler ◽  
...  

AbstractMotivationGenome-wide association studies have had great success in identifying human genetic variants associated with disease, disease risk factors, and other biomedical phenotypes. Many variants are associated with multiple traits, even after correction for trait-trait correlation. Discovering subsets of variants associated with a shared subset of phenotypes could help reveal disease mechanisms, suggest new therapeutic options, and increase the power to detect additional variants with similar pattern of associations. Here we introduce two methods based on a Bayesian framework, SNP And Pleiotropic PHenotype Organization (SAPPHO), one modeling independent phenotypes (SAPPHO-I) and the other incorporating a full phenotype covariance structure (SAPPHO-C). These two methods learn patterns of pleiotropy from genotype and phenotype data, using identified associations to discover additional associations with shared patterns.ResultsThe SAPPHO methods, along with other recent approaches for pleiotropic association tests, were assessed using data from the Atherosclerotic Risk in Communities (ARIC) study of 8,000 individuals, whose gold-standard associations were provided by meta-analysis of 40,000 to 100,000 individuals from the CHARGE consortium. Using power to detect gold-standard associations at genome-wide significance (0.05 family-wise error rate) as a metric, SAPPHO performed best. The SAPPHO methods were also uniquely able to select the most significant variants in a parsimonious model, excluding other less likely variants within a linkage disequilibrium block. For meta-analysis, the SAPPHO methods implement summary modes that use sufficient statistics rather than full phenotype and genotype data. Meta-analysis applied to CHARGE detected 16 additional associations to the gold-standard loci, as well as 124 novel loci, at 0.05 false discovery rate. Reasons for the superior performance were explored by performing simulations over a range of scenarios describing different genetic architectures. With SAPPHO we were able to learn genetic structures that were hidden using the traditional univariate tests.Availabilityhttps://bitbucket.org/baderlab/fast/wiki/Home. SAPPHO software is available under the GNU General Public License, v2.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Fabricio Almeida-Silva ◽  
Thiago M. Venancio

AbstractSoybean is one of the most important legume crops worldwide. However, soybean yield is dramatically affected by fungal diseases, leading to economic losses of billions of dollars yearly. Here, we integrated publicly available genome-wide association studies and transcriptomic data to prioritize candidate genes associated with resistance to Cadophora gregata, Fusarium graminearum, Fusarium virguliforme, Macrophomina phaseolina, and Phakopsora pachyrhizi. We identified 188, 56, 11, 8, and 3 high-confidence candidates for resistance to F. virguliforme, F. graminearum, C. gregata, M. phaseolina and P. pachyrhizi, respectively. The prioritized candidate genes are highly conserved in the pangenome of cultivated soybeans and are heavily biased towards fungal species-specific defense responses. The vast majority of the prioritized candidate resistance genes are related to plant immunity processes, such as recognition, signaling, oxidative stress, systemic acquired resistance, and physical defense. Based on the number of resistance alleles, we selected the five most resistant accessions against each fungal species in the soybean USDA germplasm. Interestingly, the most resistant accessions do not reach the maximum theoretical resistance potential. Hence, they can be further improved to increase resistance in breeding programs or through genetic engineering. Finally, the coexpression network generated here is available in a user-friendly web application (https://soyfungigcn.venanciogroup.uenf.br/) and an R/Shiny package (https://github.com/almeidasilvaf/SoyFungiGCN) that serve as a public resource to explore soybean-pathogenic fungi interactions at the transcriptional level.


Sign in / Sign up

Export Citation Format

Share Document