Excess false positive rate caused by population stratification and disease rate heterogeneity in case–control association studies

2009 ◽  
Vol 53 (5) ◽  
pp. 1767-1781 ◽  
Author(s):  
Zhaohai Li ◽  
Hong Zhang ◽  
Gang Zheng ◽  
Joseph L. Gastwirth ◽  
Mitchell H. Gail
2018 ◽  
Author(s):  
Cox Lwaka Tamba ◽  
Yuan-Ming Zhang

AbstractBackgroundRecent developments in technology result in the generation of big data. In genome-wide association studies (GWAS), we can get tens of million SNPs that need to be tested for association with a trait of interest. Indeed, this poses a great computational challenge. There is a need for developing fast algorithms in GWAS methodologies. These algorithms must ensure high power in QTN detection, high accuracy in QTN estimation and low false positive rate.ResultsHere, we accelerated mrMLM algorithm by using GEMMA idea, matrix transformations and identities. The target functions and derivatives in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. All potentially associated QTNs with P-values ≤ 0.01 are evaluated in a multi-locus model by LARS algorithm and/or EM-Empirical Bayes. We call the algorithm FASTmrMLM. Numerical simulation studies and real data analysis validated the FASTmrMLM. FASTmrMLM reduces the running time in mrMLM by more than 50%. FASTmrMLM also shows high statistical power in QTN detection, high accuracy in QTN estimation and low false positive rate as compared to GEMMA, FarmCPU and mrMLM. Real data analysis shows that FASTmrMLM was able to detect more previously reported genes than all the other methods: GEMMA/EMMA, FarmCPU and mrMLM.ConclusionsFASTmrMLM is a fast and reliable algorithm in multi-locus GWAS and ensures high statistical power, high accuracy of estimates and low false positive rate.Author SummaryThe current developments in technology result in the generation of a vast amount of data. In genome-wide association studies, we can get tens of million markers that need to be tested for association with a trait of interest. Due to the computational challenge faced, we developed a fast algorithm for genome-wide association studies. Our approach is a two stage method. In the first step, we used matrix transformations and identities to quicken the testing of each random marker effect. The target functions and derivatives which are in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. In the second step, we selected all potentially associated SNPs and evaluated them in a multi-locus model. From simulation studies, our algorithm significantly reduces the computing time. The new method also shows high statistical power in detecting significant markers, high accuracy in marker effect estimation and low false positive rate. We also used the new method to identify relevant genes in real data analysis. We recommend our approach as a fast and reliable method for carrying out a multi-locus genome-wide association study.


2016 ◽  
Author(s):  
Weikang Gong ◽  
Lin Wan ◽  
Wenlian Lu ◽  
Liang Ma ◽  
Fan Cheng ◽  
...  

AbstractThe identification of connexel-wise associations, which involves examining functional connectivities between pairwise voxels across the whole brain, is both statistically and computationally challenging. Although such a connexel-wise methodology has recently been adopted by brain-wide association studies (BWAS) to identify connectivity changes in several mental disorders, such as schizophrenia, autism and depression [Cheng et al., 2015a,b, 2016], the multiple correction and power analysis methods designed specifically for connexel-wise analysis are still lacking. Therefore, we herein report the development of a rigorous statistical framework for connexel-wise significance testing based on the Gaussian random field theory. It includes controlling the family-wise error rate (FWER) of multiple hypothesis testings using topological inference methods, and calculating power and sample size for a connexel-wise study. Our theoretical framework can control the false-positive rate accurately, as validated empirically using two resting-state fMRI datasets. Compared with Bonferroni correction and false discovery rate (FDR), it can reduce false-positive rate and increase statistical power by appropriately utilizing the spatial information of fMRI data. Importantly, our method considerably reduces the computational complexity of a permutation-or simulation-based approach, thus, it can efficiently tackle large datasets with ultra-high resolution images. The utility of our method is shown in a case-control study. Our approach can identify altered functional connectivities in a major depression disorder dataset, whereas existing methods failed. A software package is available at https://github.com/weikanggong/BWAS.


2021 ◽  
Author(s):  
Baptiste Couvy-Duchesne ◽  
Futao Zhang ◽  
Kathryn E. Kemper ◽  
Julia Sidorenko ◽  
Naomi R. Wray ◽  
...  

2.AbstractCovariance between grey-matter measurements can reflect structural or functional brain networks though it has also been shown to be influenced by confounding factors (e.g. age, head size, scanner), which could lead to lower mapping precision (increased size of associated clusters) and create distal false positives associations in mass-univariate vertex-wise analyses. We evaluated this concern by performing state-of-the-art mass-univariate analyses (general linear model, GLM) on traits simulated from real vertex-wise grey matter data (including cortical and subcortical thickness and surface area). We contrasted the results with those from linear mixed models (LMMs), which have been shown to overcome similar issues in omics association studies. We showed that when performed on a large sample (N=8,662, UK Biobank), GLMs yielded large spatial clusters of significant vertices and greatly inflated false positive rate (Family Wise Error Rate: FWER=1, cluster false discovery rate: FDR>0.6). We showed that LMMs resulted in more parsimonious results: smaller clusters and reduced false positive rate (yet FWER>5% after Bonferroni correction) but at a cost of increased computation. In practice, the parsimony of LMMs results from controlling for the joint effect of all vertices, which prevents local and distal redundant associations from reaching significance. Next, we performed mass-univariate association analyses on five real UKB traits (age, sex, BMI, fluid intelligence and smoking status) and LMM yielded fewer and more localised associations. We identified 19 significant clusters displaying small associations with age, sex and BMI, which suggest a complex architecture of at least dozens of associated areas with those phenotypes.


2012 ◽  
Vol 36 (3) ◽  
pp. 195-205 ◽  
Author(s):  
Michael P. Epstein ◽  
Richard Duncan ◽  
K. Alaine Broadaway ◽  
Min He ◽  
Andrew S. Allen ◽  
...  

2004 ◽  
Vol 58 (1) ◽  
pp. 40-48 ◽  
Author(s):  
Prakash Gorroochurn ◽  
Susan E. Hodge ◽  
Gary Heiman ◽  
David A. Greenberg

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Wenhan Chen ◽  
Yang Wu ◽  
Zhili Zheng ◽  
Ting Qi ◽  
Peter M. Visscher ◽  
...  

AbstractSummary statistics from genome-wide association studies (GWAS) have facilitated the development of various summary data-based methods, which typically require a reference sample for linkage disequilibrium (LD) estimation. Analyses using these methods may be biased by errors in GWAS summary data or LD reference or heterogeneity between GWAS and LD reference. Here we propose a quality control method, DENTIST, that leverages LD among genetic variants to detect and eliminate errors in GWAS or LD reference and heterogeneity between the two. Through simulations, we demonstrate that DENTIST substantially reduces false-positive rate in detecting secondary signals in the summary-data-based conditional and joint association analysis, especially for imputed rare variants (false-positive rate reduced from >28% to <2% in the presence of heterogeneity between GWAS and LD reference). We further show that DENTIST can improve other summary-data-based analyses such as fine-mapping analysis.


2004 ◽  
Vol 58 (1) ◽  
pp. 30-39 ◽  
Author(s):  
Gary A. Heiman ◽  
Susan E. Hodge ◽  
Prakash Gorroochurn ◽  
Junying Zhang ◽  
David A. Greenberg

Sign in / Sign up

Export Citation Format

Share Document