scholarly journals The multiple testing burden in sequencing-based disease studies of global populations

2016 ◽  
Author(s):  
Sara L. Pulit ◽  
Sera A.J. de With ◽  
Paul I.W. de Bakker

AbstractGenome-wide association studies (GWAS) of common disease have been hugely successful in implicating loci that modify disease risk. The bulk of these associations have proven robust and reproducible, in part due to community adoption of statistical criteria for claiming significant genotype-phenotype associations. Currently, studies of common disease are rapidly shifting towards the use of sequencing technologies. As the cost of sequencing drops, assembling large samples in global populations is becoming increasingly feasible. Sequencing studies interrogate not only common variants, as was true for genotyping-based GWAS, but variation across the full allele frequency spectrum, yielding many more (independent) statistical tests. We sought to empirically determine genome-wide significance for various analysis scenarios. Using whole-genome sequence data, we simulated sequencing-based disease studies of varying sample size and ancestry. We determined that future sequencing efforts in >2,000 samples should practically employ a genome-wide significance threshold of of p <5 ×10−9, though the threshold does vary with ancestry. Studies of European or East Asian ancestry should set genome-wide significance at approximately p <5×10−9, but similar studies of African or South Asian samples should be more stringent (p <1×10−9). Because sequencing analysis brings with it many challenges (especially for rare variants), appropriate adoption of a revised multiple test correction will be crucial to avoid irreproducible claims of association.

2017 ◽  
Author(s):  
Carlo Maj ◽  
Elena Milanesi ◽  
Massimo Gennarelli ◽  
Luciano Milanesi ◽  
ivan Merelli

In complex phenotypes (e.g., psychiatric diseases) single locus tests, commonly performed with Genome-Wide Association Studies, have proven to be limited in discovering strong gene associations. A growing body of evidence suggests that epistatic non-linear effects may be responsible for complex phenotypes arising from the interaction of different biological factors. A major issue in epistasis analysis is the computational burden due to the huge number of statistical tests to be performed when considering all the potential genotype combinations. In this work, we developed a computational efficient pipeline to investigate the presence of epistasis at a genome-wide scale in bipolar disorder, which is a typical example of complex phenotype with a relevant but unexplained genetic background. By running our pipeline we were able to identify 13 epistasis interactions between variants located in genes potentially involved in biological processes associated with the analyzed phenotype.


2021 ◽  
Author(s):  
Steven Gazal ◽  
Omer Weissbrod ◽  
Farhad Hormozdiari ◽  
Kushal Dey ◽  
Joseph Nasser ◽  
...  

Although genome-wide association studies (GWAS) have identified thousands of disease-associated common SNPs, these SNPs generally do not implicate the underlying target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis, but it is unclear how these strategies should be applied in the context of interpreting common disease risk variants. We developed a framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk, leveraging polygenic analyses of disease heritability to define and estimate their precision and recall. We applied our framework to GWAS summary statistics for 63 diseases and complex traits (average N=314K), evaluating 50 S2G strategies. Our optimal combined S2G strategy (cS2G) included 7 constituent S2G strategies (Exon, Promoter, 2 fine-mapped cis-eQTL strategies, EpiMap enhancer-gene linking, Activity-By-Contact (ABC), and Cicero), and achieved a precision of 0.75 and a recall of 0.33, more than doubling the precision and/or recall of any individual strategy; this implies that 33% of SNP-heritability can be linked to causal genes with 75% confidence. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 7,111 causal SNP-gene-disease triplets (with S2G-derived functional interpretation) with high confidence. Finally, we applied cS2G to genome-wide fine-mapping results for these traits (not restricted to GWAS loci) to rank genes by the heritability linked to each gene, providing an empirical assessment of disease omnigenicity; averaging across traits, we determined that the top 200 (1%) of ranked genes explained roughly half of the heritability linked to all genes. Our results highlight the benefits of our cS2G strategy in providing functional interpretation of GWAS findings; we anticipate that precision and recall will increase further under our framework as improved functional assays lead to improved S2G strategies. 


2019 ◽  
Vol 40 (5) ◽  
pp. 661-668 ◽  
Author(s):  
Asahi Hishida ◽  
Tomotaka Ugai ◽  
Ryosuke Fujii ◽  
Masahiro Nakatochi ◽  
Michael C Wu ◽  
...  

Abstract Although recent genome-wide association studies (GWASs) have identified genetic variants associated with Helicobacter pylori (HP)-induced gastric cancer, few studies have examined the genetic traits associated with the risk of HP-induced gastric precancerous conditions. This study aimed to elucidate genetic variants associated with these conditions using a genome-wide approach. Data from four sites of the Japan Multi-Institutional Collaborative Cohort (J-MICC) Study were used in the discovery phase (Stage I); two datasets from the Hospital-based Epidemiologic Research Program at Aichi Cancer Center 2 (HERPACC2) study were used in the replication phases (Stages II and III) and SKAT (SNP-set Kernel Association Test) and single variant-based GWASs were conducted for the risks of gastric atrophy (GA) and severe GA defined by serum pepsinogen (PG) levels, and PG1 and PG1/2 ratios. In the gene-based SKAT in Stage I, prostate stem cell antigen (PSCA) was significantly associated with the risks of GA and severe GA, and serum PG1/2 level by linear kernel [false discovery rate (FDR) = 0.011, 0.230 and 7.2 × 10−7, respectively]. The single variant-based GWAS revealed that nine PSCA single nucleotide polymorphisms (SNPs) fulfilled the genome-wide significance level (P < 5 × 10−8) for the risks of both GA and severe GA in the combined study, although most of these associations did not reach genome-wide significance in the discovery or validation cohort on their own. GWAS for serum PG1 levels and PG1/2 ratios revealed that the PSCA rs2920283 SNP had a striking P-value of 4.31 × 10−27 for PG1/2 ratios. The present GWAS revealed the genetic locus of PSCA as the most significant locus for the risk of HP-induced GA, which confirmed the recently reported association in Europeans.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Daichi Shigemizu ◽  
Risa Mitsumori ◽  
Shintaro Akiyama ◽  
Akinori Miyashita ◽  
Takashi Morizono ◽  
...  

AbstractAlzheimer’s disease (AD) has no cure, but early detection and risk prediction could allow earlier intervention. Genetic risk factors may differ between ethnic populations. To discover novel susceptibility loci of AD in the Japanese population, we conducted a genome-wide association study (GWAS) with 3962 AD cases and 4074 controls. Out of 4,852,957 genetic markers that passed stringent quality control filters, 134 in nine loci, including APOE and SORL1, were convincingly associated with AD. Lead SNPs located in seven novel loci were genotyped in an independent Japanese AD case–control cohort. The novel locus FAM47E reached genome-wide significance in a meta-analysis of association results. This is the first report associating the FAM47E locus with AD in the Japanese population. A trans-ethnic meta-analysis combining the results of the Japanese data sets with summary statistics from stage 1 data of the International Genomics of Alzheimer’s Project identified an additional novel susceptibility locus in OR2B2. Our data highlight the importance of performing GWAS in non-European populations.


2017 ◽  
Author(s):  
Carlo Maj ◽  
Elena Milanesi ◽  
Massimo Gennarelli ◽  
Luciano Milanesi ◽  
ivan Merelli

In complex phenotypes (e.g., psychiatric diseases) single locus tests, commonly performed with Genome-Wide Association Studies, have proven to be limited in discovering strong gene associations. A growing body of evidence suggests that epistatic non-linear effects may be responsible for complex phenotypes arising from the interaction of different biological factors. A major issue in epistasis analysis is the computational burden due to the huge number of statistical tests to be performed when considering all the potential genotype combinations. In this work, we developed a computational efficient pipeline to investigate the presence of epistasis at a genome-wide scale in bipolar disorder, which is a typical example of complex phenotype with a relevant but unexplained genetic background. By running our pipeline we were able to identify 13 epistasis interactions between variants located in genes potentially involved in biological processes associated with the analyzed phenotype.


2020 ◽  
Vol 66 (1) ◽  
pp. 11-23
Author(s):  
Yukihide Momozawa ◽  
Keijiro Mizukami

AbstractGenome-wide association studies have identified >10,000 genetic variants associated with various phenotypes and diseases. Although the majority are common variants, rare variants with >0.1% of minor allele frequency have been investigated by imputation and using disease-specific custom SNP arrays. Rare variants sequencing analysis mainly revealed have played unique roles in the genetics of complex diseases in humans due to their distinctive features, in contrast to common variants. Unique roles are hypothesis-free evidence for gene causality, a precise target of functional analysis for understanding disease mechanisms, a new favorable target for drug development, and a genetic marker with high disease risk for personalized medicine. As whole-genome sequencing continues to identify more rare variants, the roles associated with rare variants will also increase. However, a better estimation of the functional impact of rare variants across whole genome is needed to enhance their contribution to improvements in human health.


Author(s):  
Joseph Nasser ◽  
Drew T. Bergman ◽  
Charles P. Fulco ◽  
Philine Guckelberger ◽  
Benjamin R. Doughty ◽  
...  

AbstractGenome-wide association studies have now identified tens of thousands of noncoding loci associated with human diseases and complex traits, each of which could reveal insights into biological mechanisms of disease. Many of the underlying causal variants are thought to affect enhancers, but we have lacked genome-wide maps of enhancer-gene regulation to interpret such variants. We previously developed the Activity-by-Contact (ABC) Model to predict enhancer-gene connections and demonstrated that it can accurately predict the results of CRISPR perturbations across several cell types. Here, we apply this ABC Model to create enhancer-gene maps in 131 cell types and tissues, and use these maps to interpret the functions of fine-mapped GWAS variants. For inflammatory bowel disease (IBD), causal variants are >20-fold enriched in enhancers in particular cell types, and ABC outperforms other regulatory methods at connecting noncoding variants to target genes. Across 72 diseases and complex traits, ABC links 5,036 GWAS signals to 2,249 unique genes, including a class of 577 genes that appear to influence multiple phenotypes via variants in enhancers that act in different cell types. Guided by these variant-to-function maps, we show that an enhancer containing an IBD risk variant regulates the expression of PPIF to tune mitochondrial membrane potential. Together, our study reveals insights into principles of genome regulation, illuminates mechanisms that influence IBD, and demonstrates a generalizable strategy to connect common disease risk variants to their molecular and cellular functions.


2019 ◽  
Vol 22 (8) ◽  
pp. 1063-1069 ◽  
Author(s):  
N. S. Yudin ◽  
N. L. Podkolodnyy ◽  
T. A. Agarkova ◽  
E. V. Ignatieva

Selection by means of genetic markers is a promising approach to the eradication of infectious diseases in farm animals, especially in the absence of effective methods of treatment and prevention. Bovine leukemia virus (BLV) is spread throughout the world and represents one of the biggest problems for the livestock production and food security in Russia. However, recent genome-wide association studies have shown that sensitivity/resistance to BLV is polygenic. The aim of this study was to create a catalog of cattle genes and genes of other mammalian species involved in the pathogenesis of BLV-induced infection and to perform gene prioritization using bioinformatics methods. Based on manually collected information from a range of open sources, a total of 446 genes were included in the catalog of cattle genes and genes of other mammals involved in the pathogenesis of BLV-induced infection. The following criteria were used to prioritize 446 genes from the catalog: (1) the gene is associated with leukemia according to a genome-wide association study; (2) the gene is associated with leukemia according to a case-control study; (3) the role of the gene in leukemia development has been studied using knockout mice; (4) protein-protein interactions exist between the gene-encoded protein and either viral particles or individual viral proteins; (5) the gene is annotated with Gene Ontology terms that are overrepresented for a given list of genes; (6) the gene participates in biological pathways from the KEGG or REACTOME databases, which are over-represented for a given list of genes; (7) the protein encoded by the gene has a high number of protein-protein interactions with proteins encoded by other genes from the catalog. Based on each criterion, a rank was assigned to each gene. Then the ranks were summarized and an overall rank was determined. Prioritization of 446 candidate genes allowed us to identify 5 genes of interest (TNF,LTB,BOLA-DQA1,BOLA-DRB3,ATF2), which can affect the sensitivity/resistance of cattle to leukemia.


2018 ◽  
Vol 28 (1) ◽  
pp. 166-174 ◽  
Author(s):  
Sara L Pulit ◽  
Charli Stoneman ◽  
Andrew P Morris ◽  
Andrew R Wood ◽  
Craig A Glastonbury ◽  
...  

Abstract More than one in three adults worldwide is either overweight or obese. Epidemiological studies indicate that the location and distribution of excess fat, rather than general adiposity, are more informative for predicting risk of obesity sequelae, including cardiometabolic disease and cancer. We performed a genome-wide association study meta-analysis of body fat distribution, measured by waist-to-hip ratio (WHR) adjusted for body mass index (WHRadjBMI), and identified 463 signals in 346 loci. Heritability and variant effects were generally stronger in women than men, and we found approximately one-third of all signals to be sexually dimorphic. The 5% of individuals carrying the most WHRadjBMI-increasing alleles were 1.62 times more likely than the bottom 5% to have a WHR above the thresholds used for metabolic syndrome. These data, made publicly available, will inform the biology of body fat distribution and its relationship with disease.


Sign in / Sign up

Export Citation Format

Share Document