Excess false positive rate caused by population stratification and disease rate heterogeneity in case–control association studies

Zhaohai Li; Hong Zhang; Gang Zheng; Joseph L. Gastwirth; Mitchell H. Gail

doi:10.1016/j.csda.2008.02.021

Studying the Joint Effects of Population Stratification and Sampling in Case-Control Association Studies

Human Heredity ◽

10.1159/000297658 ◽

2010 ◽

Vol 69 (4) ◽

pp. 254-261 ◽

Cited By ~ 1

Author(s):

K.F. Cheng ◽

J.Y. Lee ◽

J.H. Chen

Keyword(s):

Population Stratification ◽

Association Studies ◽

Case Control ◽

Joint Effects ◽

Control Association

Download Full-text

A fast mrMLM algorithm for multi-locus genome-wide association studies

10.1101/341784 ◽

2018 ◽

Cited By ~ 23

Author(s):

Cox Lwaka Tamba ◽

Yuan-Ming Zhang

Keyword(s):

False Positive ◽

Statistical Power ◽

Association Studies ◽

False Positive Rate ◽

Real Data ◽

High Accuracy ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Positive Rate

AbstractBackgroundRecent developments in technology result in the generation of big data. In genome-wide association studies (GWAS), we can get tens of million SNPs that need to be tested for association with a trait of interest. Indeed, this poses a great computational challenge. There is a need for developing fast algorithms in GWAS methodologies. These algorithms must ensure high power in QTN detection, high accuracy in QTN estimation and low false positive rate.ResultsHere, we accelerated mrMLM algorithm by using GEMMA idea, matrix transformations and identities. The target functions and derivatives in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. All potentially associated QTNs with P-values ≤ 0.01 are evaluated in a multi-locus model by LARS algorithm and/or EM-Empirical Bayes. We call the algorithm FASTmrMLM. Numerical simulation studies and real data analysis validated the FASTmrMLM. FASTmrMLM reduces the running time in mrMLM by more than 50%. FASTmrMLM also shows high statistical power in QTN detection, high accuracy in QTN estimation and low false positive rate as compared to GEMMA, FarmCPU and mrMLM. Real data analysis shows that FASTmrMLM was able to detect more previously reported genes than all the other methods: GEMMA/EMMA, FarmCPU and mrMLM.ConclusionsFASTmrMLM is a fast and reliable algorithm in multi-locus GWAS and ensures high statistical power, high accuracy of estimates and low false positive rate.Author SummaryThe current developments in technology result in the generation of a vast amount of data. In genome-wide association studies, we can get tens of million markers that need to be tested for association with a trait of interest. Due to the computational challenge faced, we developed a fast algorithm for genome-wide association studies. Our approach is a two stage method. In the first step, we used matrix transformations and identities to quicken the testing of each random marker effect. The target functions and derivatives which are in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. In the second step, we selected all potentially associated SNPs and evaluated them in a multi-locus model. From simulation studies, our algorithm significantly reduces the computing time. The new method also shows high statistical power in detecting significant markers, high accuracy in marker effect estimation and low false positive rate. We also used the new method to identify relevant genes in real data analysis. We recommend our approach as a fast and reliable method for carrying out a multi-locus genome-wide association study.

Download Full-text

Statistical testing and power analysis for brain-wide association study

10.1101/089870 ◽

2016 ◽

Cited By ~ 1

Author(s):

Weikang Gong ◽

Lin Wan ◽

Wenlian Lu ◽

Liang Ma ◽

Fan Cheng ◽

...

Keyword(s):

False Positive ◽

Power Analysis ◽

Statistical Power ◽

Spatial Information ◽

Association Studies ◽

False Positive Rate ◽

Gaussian Random Field ◽

Resting State Fmri ◽

Statistical Testing ◽

Positive Rate

AbstractThe identification of connexel-wise associations, which involves examining functional connectivities between pairwise voxels across the whole brain, is both statistically and computationally challenging. Although such a connexel-wise methodology has recently been adopted by brain-wide association studies (BWAS) to identify connectivity changes in several mental disorders, such as schizophrenia, autism and depression [Cheng et al., 2015a,b, 2016], the multiple correction and power analysis methods designed specifically for connexel-wise analysis are still lacking. Therefore, we herein report the development of a rigorous statistical framework for connexel-wise significance testing based on the Gaussian random field theory. It includes controlling the family-wise error rate (FWER) of multiple hypothesis testings using topological inference methods, and calculating power and sample size for a connexel-wise study. Our theoretical framework can control the false-positive rate accurately, as validated empirically using two resting-state fMRI datasets. Compared with Bonferroni correction and false discovery rate (FDR), it can reduce false-positive rate and increase statistical power by appropriately utilizing the spatial information of fMRI data. Importantly, our method considerably reduces the computational complexity of a permutation-or simulation-based approach, thus, it can efficiently tackle large datasets with ultra-high resolution images. The utility of our method is shown in a case-control study. Our approach can identify altered functional connectivities in a major depression disorder dataset, whereas existing methods failed. A software package is available at https://github.com/weikanggong/BWAS.

Download Full-text

A parsimonious model for mass-univariate vertex-wise analysis

10.1101/2021.01.22.427735 ◽

2021 ◽

Author(s):

Baptiste Couvy-Duchesne ◽

Futao Zhang ◽

Kathryn E. Kemper ◽

Julia Sidorenko ◽

Naomi R. Wray ◽

...

Keyword(s):

False Positive ◽

Grey Matter ◽

Fluid Intelligence ◽

Smoking Status ◽

Association Studies ◽

False Positive Rate ◽

Joint Effect ◽

Association Analyses ◽

Functional Brain Networks ◽

Positive Rate

2.AbstractCovariance between grey-matter measurements can reflect structural or functional brain networks though it has also been shown to be influenced by confounding factors (e.g. age, head size, scanner), which could lead to lower mapping precision (increased size of associated clusters) and create distal false positives associations in mass-univariate vertex-wise analyses. We evaluated this concern by performing state-of-the-art mass-univariate analyses (general linear model, GLM) on traits simulated from real vertex-wise grey matter data (including cortical and subcortical thickness and surface area). We contrasted the results with those from linear mixed models (LMMs), which have been shown to overcome similar issues in omics association studies. We showed that when performed on a large sample (N=8,662, UK Biobank), GLMs yielded large spatial clusters of significant vertices and greatly inflated false positive rate (Family Wise Error Rate: FWER=1, cluster false discovery rate: FDR>0.6). We showed that LMMs resulted in more parsimonious results: smaller clusters and reduced false positive rate (yet FWER>5% after Bonferroni correction) but at a cost of increased computation. In practice, the parsimony of LMMs results from controlling for the joint effect of all vertices, which prevents local and distal redundant associations from reaching significance. Next, we performed mass-univariate association analyses on five real UKB traits (age, sex, BMI, fluid intelligence and smoking status) and LMM yielded fewer and more localised associations. We identified 19 significant clusters displaying small associations with age, sex and BMI, which suggest a complex architecture of at least dozens of associated areas with those phenotypes.

Download Full-text

Stratification-Score Matching Improves Correction for Confounding by Population Stratification in Case-Control Association Studies

Genetic Epidemiology ◽

10.1002/gepi.21611 ◽

2012 ◽

Vol 36 (3) ◽

pp. 195-205 ◽

Cited By ~ 15

Author(s):

Michael P. Epstein ◽

Richard Duncan ◽

K. Alaine Broadaway ◽

Min He ◽

Andrew S. Allen ◽

...

Keyword(s):

Population Stratification ◽

Association Studies ◽

Case Control ◽

Control Association

Download Full-text

Evaluating bias due to population stratification in case-control association studies of admixed populations

Genetic Epidemiology ◽

10.1002/gepi.20003 ◽

2004 ◽

Vol 27 (1) ◽

pp. 14-20 ◽

Cited By ~ 48

Author(s):

Yiting Wang ◽

Russell Localio ◽

Timothy R. Rebbeck

Keyword(s):

Population Stratification ◽

Association Studies ◽

Case Control ◽

Control Association

Download Full-text

Effect of Population Stratification on Case-Control Association Studies

Human Heredity ◽

10.1159/000081455 ◽

2004 ◽

Vol 58 (1) ◽

pp. 40-48 ◽

Cited By ~ 22

Author(s):

Prakash Gorroochurn ◽

Susan E. Hodge ◽

Gary Heiman ◽

David A. Greenberg

Keyword(s):

Population Stratification ◽

Association Studies ◽

Case Control ◽

Control Association

Download Full-text

Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors

Nature Communications ◽

10.1038/s41467-021-27438-7 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Wenhan Chen ◽

Yang Wu ◽

Zhili Zheng ◽

Ting Qi ◽

Peter M. Visscher ◽

...

Keyword(s):

False Positive ◽

Rare Variants ◽

Control Method ◽

Association Studies ◽

False Positive Rate ◽

Reference Sample ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Positive Rate ◽

Summary Data

AbstractSummary statistics from genome-wide association studies (GWAS) have facilitated the development of various summary data-based methods, which typically require a reference sample for linkage disequilibrium (LD) estimation. Analyses using these methods may be biased by errors in GWAS summary data or LD reference or heterogeneity between GWAS and LD reference. Here we propose a quality control method, DENTIST, that leverages LD among genetic variants to detect and eliminate errors in GWAS or LD reference and heterogeneity between the two. Through simulations, we demonstrate that DENTIST substantially reduces false-positive rate in detecting secondary signals in the summary-data-based conditional and joint association analysis, especially for imputed rare variants (false-positive rate reduced from >28% to <2% in the presence of heterogeneity between GWAS and LD reference). We further show that DENTIST can improve other summary-data-based analyses such as fine-mapping analysis.

Download Full-text

Effect of Population Stratification on Case-Control Association Studies

Human Heredity ◽

10.1159/000081454 ◽

2004 ◽

Vol 58 (1) ◽

pp. 30-39 ◽

Cited By ~ 29

Author(s):

Gary A. Heiman ◽

Susan E. Hodge ◽

Prakash Gorroochurn ◽

Junying Zhang ◽

David A. Greenberg

Keyword(s):

Population Stratification ◽

Association Studies ◽

Case Control ◽

Control Association

Download Full-text

Simultaneously Correcting for Population Stratification and for Genotyping Error in Case-Control Association Studies

The American Journal of Human Genetics ◽

10.1086/520962 ◽

2007 ◽

Vol 81 (4) ◽

pp. 726-743 ◽

Cited By ~ 6

Author(s):

K.F. Cheng ◽

W.J. Lin

Keyword(s):

Population Stratification ◽

Association Studies ◽

Case Control ◽

Genotyping Error ◽

Control Association

Download Full-text