scholarly journals A Regression-based Framework for Scalable Pathway-guided Search in Genome-wide Association Studies

2017 ◽  
Author(s):  
Shrayashi Biswas ◽  
Soumen Pal ◽  
Samsiddhi Bhattacharjee

AbstractTraditional unbiased genome-wide association studies (GWAS) have successfully identified thousands of loci associated with various complex diseases but there is evidence to suggest that many variants were missed at stringent genome-wide thresholds. Fortunately, there is a rapidly increasing amount of prior knowledge in publicly available genomic datasets and biological databases that can be harnessed to enhance the power of discovering SNPs/Genes from existing or new GWAS datasets. For most diseases, many of the identified loci tend to cluster into a few specific biological pathways/networks. From the point of view of disease etiology, such clustering is generally to be expected. This phenomenon can be exploited to conduct a more powerful genome-wide scan that is tailored to identify loci that are interconnected in pathways. We propose a scalable regression-based analytical framework to enable such a pathway-guided GWAS and demonstrate that it provides significant gains in power to detect disease associated SNPs. Our method requires two inputs, namely a) genome-wide summary level data (e.g., SNP p-values) and b) a grouping of genes into biologically meaningful categories (e.g., a database of pathways). It automatically adjusts the input p-values by incorporating the knowledge derived adaptively from the data and the pathways specified. The method involves a regularized logistic regression analysis to derive priors of each SNP and then re-weights the p-values of SNPs so as to maximize overall power of making discoveries. It increases the power to discover SNPs co-clustering into some of these pathways, while maintaining the global type-1 error (FWER) at the desired level. We used whole-genome simulations and summary data from real GWA studies of psoriasis, SLE, coronary artery disease and type-2 diabetes to illustrate the power improvement achieved by pathway-guided search. Our pipeline implemented as an R package can flexibly handle large number of prior annotations possibly derived from multiple databases.

2010 ◽  
Vol 49 (06) ◽  
pp. 632-640 ◽  
Author(s):  
J. Hebebrand ◽  
H.-E. Wichmann ◽  
K.-H. Jöckel ◽  
A. Scherag

Summary Background: Genome-wide association studies (GWAS) were highly successful in identifying new susceptibility loci of complex traits. Such studies usually start with genotyping fixed arrays of genetic markers in an initial sample. Out of these markers, some are selected which will be further genotyped in independentsamples. Due tothevery low a priori probability of a true positive association, the vast majority of all marker signals will turn out to be false positive. Thus, several methods to sort marker data have been proposed which will be evaluated here. Objectives: We compared statistical properties of ranking by p-values, q-values, the False Positive Report Probability (FPRP) and the Bayesian False-Discovery Probability (BFDP). Methods: We performed simulation studies for a genomic region derived from GWAS data sets and calculated descriptive statistics as well as mean square errors with regard to the true marker ranking. Additionally, we applied all measures to a GWAS for early onset extreme obesity superimposing a priori information on candidate genes. Results: Despite the known, more extreme probability results for traditional p-values, we observed that both p-values and the BFDP were more precise in reconstructing the “true” order of the markers in a region. In addition, the BFDP was useful to attenuate unexpected effects at a genome-wide scale. Conclusions: For the purpose of selecting markers from an initial GWAS and within the limits of this study, we recommend either ranking by p-values or the application of a full Bayesian approach for which the BFDP is a first approximation.


2019 ◽  
Vol 22 (8) ◽  
pp. 1063-1069 ◽  
Author(s):  
N. S. Yudin ◽  
N. L. Podkolodnyy ◽  
T. A. Agarkova ◽  
E. V. Ignatieva

Selection by means of genetic markers is a promising approach to the eradication of infectious diseases in farm animals, especially in the absence of effective methods of treatment and prevention. Bovine leukemia virus (BLV) is spread throughout the world and represents one of the biggest problems for the livestock production and food security in Russia. However, recent genome-wide association studies have shown that sensitivity/resistance to BLV is polygenic. The aim of this study was to create a catalog of cattle genes and genes of other mammalian species involved in the pathogenesis of BLV-induced infection and to perform gene prioritization using bioinformatics methods. Based on manually collected information from a range of open sources, a total of 446 genes were included in the catalog of cattle genes and genes of other mammals involved in the pathogenesis of BLV-induced infection. The following criteria were used to prioritize 446 genes from the catalog: (1) the gene is associated with leukemia according to a genome-wide association study; (2) the gene is associated with leukemia according to a case-control study; (3) the role of the gene in leukemia development has been studied using knockout mice; (4) protein-protein interactions exist between the gene-encoded protein and either viral particles or individual viral proteins; (5) the gene is annotated with Gene Ontology terms that are overrepresented for a given list of genes; (6) the gene participates in biological pathways from the KEGG or REACTOME databases, which are over-represented for a given list of genes; (7) the protein encoded by the gene has a high number of protein-protein interactions with proteins encoded by other genes from the catalog. Based on each criterion, a rank was assigned to each gene. Then the ranks were summarized and an overall rank was determined. Prioritization of 446 candidate genes allowed us to identify 5 genes of interest (TNF,LTB,BOLA-DQA1,BOLA-DRB3,ATF2), which can affect the sensitivity/resistance of cattle to leukemia.


2018 ◽  
Vol 28 (1) ◽  
pp. 166-174 ◽  
Author(s):  
Sara L Pulit ◽  
Charli Stoneman ◽  
Andrew P Morris ◽  
Andrew R Wood ◽  
Craig A Glastonbury ◽  
...  

Abstract More than one in three adults worldwide is either overweight or obese. Epidemiological studies indicate that the location and distribution of excess fat, rather than general adiposity, are more informative for predicting risk of obesity sequelae, including cardiometabolic disease and cancer. We performed a genome-wide association study meta-analysis of body fat distribution, measured by waist-to-hip ratio (WHR) adjusted for body mass index (WHRadjBMI), and identified 463 signals in 346 loci. Heritability and variant effects were generally stronger in women than men, and we found approximately one-third of all signals to be sexually dimorphic. The 5% of individuals carrying the most WHRadjBMI-increasing alleles were 1.62 times more likely than the bottom 5% to have a WHR above the thresholds used for metabolic syndrome. These data, made publicly available, will inform the biology of body fat distribution and its relationship with disease.


Genetics ◽  
2019 ◽  
Vol 213 (4) ◽  
pp. 1225-1236 ◽  
Author(s):  
Weimiao Wu ◽  
Zhong Wang ◽  
Ke Xu ◽  
Xinyu Zhang ◽  
Amei Amei ◽  
...  

Longitudinal phenotypes have been increasingly available in genome-wide association studies (GWAS) and electronic health record-based studies for identification of genetic variants that influence complex traits over time. For longitudinal binary data, there remain significant challenges in gene mapping, including misspecification of the model for phenotype distribution due to ascertainment. Here, we propose L-BRAT (Longitudinal Binary-trait Retrospective Association Test), a retrospective, generalized estimating equation-based method for genetic association analysis of longitudinal binary outcomes. We also develop RGMMAT, a retrospective, generalized linear mixed model-based association test. Both tests are retrospective score approaches in which genotypes are treated as random conditional on phenotype and covariates. They allow both static and time-varying covariates to be included in the analysis. Through simulations, we illustrated that retrospective association tests are robust to ascertainment and other types of phenotype model misspecification, and gain power over previous association methods. We applied L-BRAT and RGMMAT to a genome-wide association analysis of repeated measures of cocaine use in a longitudinal cohort. Pathway analysis implicated association with opioid signaling and axonal guidance signaling pathways. Lastly, we replicated important pathways in an independent cocaine dependence case-control GWAS. Our results illustrate that L-BRAT is able to detect important loci and pathways in a genome scan and to provide insights into genetic architecture of cocaine use.


2019 ◽  
Vol 8 (2) ◽  
pp. 275 ◽  
Author(s):  
Eun Hong ◽  
Bong Kim ◽  
Steve Cho ◽  
Jin Yang ◽  
Hyuk Choi ◽  
...  

Genome-wide association studies found genetic variations with modulatory effects for intracranial aneurysm (IA) formations in European and Japanese populations. We aimed to identify the susceptibility of single nucleotide polymorphisms (SNPs) to IA in a Korean population consisting of 250 patients, and 294 controls using the Asian-specific Axiom Precision Medicine Research Array. Twenty-nine SNPs reached a genome-wide significance threshold (5 × 10−8). The rs371331393 SNP, with a stop-gain function of ARHGAP32 (11q24.3), showed the most significant association with the risk of IA (OR = 43.57, 95% CI: 21.84–86.95; p = 9.3 × 10−27). Eight out of 29 SNPs—GBA (rs75822236), TCF24 (rs112859779), OLFML2A (rs79134766), ARHGAP32 (rs371331393), CD163L1 (rs138525217), CUL4A (rs74115822), LOC102724084 (rs75861150), and LRRC3 (rs116969723)—demonstrated sufficient statistical power greater than or equal to 0.8. Two previously reported SNPs, rs700651 (BOLL, 2q33.1) and rs6841581 (EDNRA, 4q31.22), were validated in our GWAS (Genome-wide association study). In a subsequent analysis, three SNPs showed a significant difference in expressions: the rs6741819 (RNF144A, 2p25.1) was down-regulated in the adrenal gland tissue (p = 1.5 × 10−6), the rs1052270 (TMOD1. 9q22.33) was up-regulated in the testis tissue (p = 8.6 × 10−10), and rs6841581 (EDNRA, 4q31.22) was up-regulated in both the esophagus (p = 5.2 × 10−12) and skin tissues (1.2 × 10−6). Our GWAS showed novel candidate genes with Korean-specific variations in IA formations. Large population based studies are thus warranted.


2018 ◽  
Vol 13 (5) ◽  
pp. 648-658 ◽  
Author(s):  
Yoichi Kakuta ◽  
Yosuke Kawai ◽  
Takeo Naito ◽  
Atsushi Hirano ◽  
Junji Umeno ◽  
...  

Abstract Background and Aims Genome-wide association studies [GWASs] of European populations have identified numerous susceptibility loci for Crohn’s disease [CD]. Susceptibility genes differ by ethnicity, however, so GWASs specific for Asian populations are required. This study aimed to clarify the Japanese-specific genetic background for CD by a GWAS using the Japonica array [JPA] and subsequent imputation with the 1KJPN reference panel. Methods Two independent Japanese case/control sets (Tohoku region [379 CD patients, 1621 controls] and Kyushu region [334 CD patients, 462 controls]) were included. GWASs were performed separately for each population, followed by a meta-analysis. Two additional replication sets [254 + 516 CD patients and 287 + 565 controls] were analysed for top hit single nucleotide polymorphisms [SNPs] from novel genomic regions. Results Genotype data of 4 335 144 SNPs from 713 Japanese CD patients and 2083 controls were analysed. SNPs located in TNFSF15 (rs78898421, Pmeta = 2.59 × 10−26, odds ratio [OR] = 2.10), HLA-DQB1 [rs184950714, pmeta = 3.56 × 10−19, OR = 2.05], ZNF365, and 4p14 loci were significantly associated with CD in Japanese individuals. Replication analyses were performed for four novel candidate loci [p <1 × 10−6], and rs488200 located upstream of RAP1A was significantly associated with CD [pcombined = 4.36 × 10−8, OR = 1.31]. Transcriptome analysis of CD4+ effector memory T cells from lamina propria mononuclear cells of CD patients revealed a significant association of rs488200 with RAP1A expression. Conclusions RAP1A is a novel susceptibility locus for CD in the Japanese population.


Genome ◽  
2010 ◽  
Vol 53 (11) ◽  
pp. 876-883 ◽  
Author(s):  
Ben Hayes ◽  
Mike Goddard

Results from genome-wide association studies in livestock, and humans, has lead to the conclusion that the effect of individual quantitative trait loci (QTL) on complex traits, such as yield, are likely to be small; therefore, a large number of QTL are necessary to explain genetic variation in these traits. Given this genetic architecture, gains from marker-assisted selection (MAS) programs using only a small number of DNA markers to trace a limited number of QTL is likely to be small. This has lead to the development of alternative technology for using the available dense single nucleotide polymorphism (SNP) information, called genomic selection. Genomic selection uses a genome-wide panel of dense markers so that all QTL are likely to be in linkage disequilibrium with at least one SNP. The genomic breeding values are predicted to be the sum of the effect of these SNPs across the entire genome. In dairy cattle breeding, the accuracy of genomic estimated breeding values (GEBV) that can be achieved and the fact that these are available early in life have lead to rapid adoption of the technology. Here, we discuss the design of experiments necessary to achieve accurate prediction of GEBV in future generations in terms of the number of markers necessary and the size of the reference population where marker effects are estimated. We also present a simple method for implementing genomic selection using a genomic relationship matrix. Future challenges discussed include using whole genome sequence data to improve the accuracy of genomic selection and management of inbreeding through genomic relationships.


2015 ◽  
Author(s):  
Tim B Bigdeli ◽  
Donghyung Lee ◽  
Brien P Riley ◽  
Vladimir I Vladimirov ◽  
Ayman H Fanous ◽  
...  

Genome scans, including both genome-wide association studies and deep sequencing, continue to discover a growing number of significant association signals for various traits. However, often variants meeting genome-wide significance criteria explain far less of the overall trait variance than “sub-threshold” association signals. To extract these sub-threshold signals, there is a need for methods which accurately estimate the mean of all (normally-distributed) test-statistics from a genome scan (i.e., Z-scores). This is currently achieved by the difficult procedures of adjusting all Z-score (χ_1^2) statistics for “winner’s curse” (multiple testing). Given that multiple testing adjustments are much simpler for p-values, we propose a method for estimating Z-scores means by i) first adjusting their p-values for multiple testing and then ii) transforming the adjusted p-values to upper tail Z-scores with the sign of the original statistics. Because a False Discovery Rate (FDR) procedure is used for multiple testing adjustment, we denote this method FDR Inverse Quantile Transformation (FIQT). When compared to competitors, e.g. Empirical Bayes (including proposed improvements), FIQT is more i) accurate and ii) computationally efficient by orders of magnitude. Its accuracy advantage is substantial at larger sample sizes and/or moderate numbers of association signals. Practical application of FIQT to Z-scores from the first Psychiatric Genetic Consortium (PGC) schizophrenia predicts a non-trivial fraction of the significant signal regions from the subsequent published PGC schizophrenia studies. Finally, we suggest that FIQT might be i) used to improve subject level risk prediction and ii) further improved by modelling the noncentrality of χ_1^2 statistics.


Sign in / Sign up

Export Citation Format

Share Document