scholarly journals A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS

2017 ◽  
Author(s):  
Rounak Dey ◽  
Ellen M. Schmidt ◽  
Goncalo R. Abecasis ◽  
Seunggeun Lee

AbstractThe availability of electronic health record (EHR)-based phenotypes allows for genome-wide association analyses in thousands of traits, and has great potential to identify novel genetic variants associated with clinical phenotypes. We can interpret the phenome-wide association study (PheWAS) result for a single genetic variant by observing its association across a landscape of phenotypes. Since PheWAS can test 1000s of binary phenotypes, and most of them have unbalanced (case:control = 1:10) or often extremely unbalanced (case:control = 1:600) case-control ratios, existing methods cannot provide an accurate and scalable way to test for associations. Here we propose a computationally fast score test-based method that estimates the distribution of the test statistic using the saddlepoint approximation. Our method is much faster than the state of the art Firth’s test (∼ 100 times). It can also adjust for covariates and control type I error rates even when the case-control ratio is extremely unbalanced. Through application to PheWAS data from the Michigan Genomics Initiative, we show that the proposed method can control type I error rates while replicating previously known association signals even for traits with a very small number of cases and a large number of controls.

2019 ◽  
Author(s):  
Zhangchen Zhao ◽  
Wenjian Bi ◽  
Wei Zhou ◽  
Peter VandeHaar ◽  
Lars G. Fritsche ◽  
...  

AbstractIn biobank data analysis, most binary phenotypes have unbalanced case-control ratios, which can cause inflation of type I error rates. Recently, a saddlepoint approximation (SPA) based single variant test has been developed to provide an accurate and scalable method to test for associations of such phenotypes. For gene- or region-based multiple variant tests, a few methods exist which adjust for unbalanced case-control ratios; however, these methods are either less accurate when case-control ratios are extremely unbalanced or not scalable for large data analyses. To address these problems, we propose SKAT/SKAT-O type region-based tests, where the single-variant score statistic is calibrated based on SPA and Efficient Resampling (ER). Through simulation studies, we show that the proposed method provides well-calibrated p-values. In contrast, the unadjusted approach has greatly inflated type I error rates (90 times of exome-wideα=2.5×10-6) when the case-control ratio is 1:99. Additionally, the proposed method has similar computation time as the unadjusted approaches and is scalable for large sample data. Our UK Biobank whole exome sequence data analysis of 45,596 unrelated European samples and 791 PheCode phenotypes identified 10 rare variant associations with p-value < 10-7, including the associations betweenJAK2and myeloproliferative disease,TNCand large cell lymphoma andF11and congenital coagulation defects. All analysis summary results are publicly available through a web-based visual server.


1995 ◽  
Vol 20 (1) ◽  
pp. 27-39 ◽  
Author(s):  
James Algina ◽  
R. Clifford Blair ◽  
William T. Coombs

A maximum test in which the test statistic is the more extreme of the Brown-Forsythe and O’Brien’s test statistics is developed. Estimated Type I error rates and power are presented for the Brown-Forsythe test, O’Brien’s test, and the maximum test. For the conditions included in the study, Type I error rates for the maximum test are near the nominal level. In all conditions, the power of the maximum test tended to be equal to or greater than that of the test—O’Brien or Brown-Forsythe—that had the larger power.


2019 ◽  
Vol 14 (2) ◽  
pp. 399-425 ◽  
Author(s):  
Haolun Shi ◽  
Guosheng Yin

2014 ◽  
Vol 38 (2) ◽  
pp. 109-112 ◽  
Author(s):  
Daniel Furtado Ferreira

Sisvar is a statistical analysis system with a large usage by the scientific community to produce statistical analyses and to produce scientific results and conclusions. The large use of the statistical procedures of Sisvar by the scientific community is due to it being accurate, precise, simple and robust. With many options of analysis, Sisvar has a not so largely used analysis that is the multiple comparison procedures using bootstrap approaches. This paper aims to review this subject and to show some advantages of using Sisvar to perform such analysis to compare treatments means. Tests like Dunnett, Tukey, Student-Newman-Keuls and Scott-Knott are performed alternatively by bootstrap methods and show greater power and better controls of experimentwise type I error rates under non-normal, asymmetric, platykurtic or leptokurtic distributions.


2021 ◽  
Author(s):  
Megha Joshi ◽  
James E Pustejovsky ◽  
S. Natasha Beretvas

The most common and well-known meta-regression models work under the assumption that there is only one effect size estimate per study and that the estimates are independent. However, meta-analytic reviews of social science research often include multiple effect size estimates per primary study, leading to dependence in the estimates. Some meta-analyses also include multiple studies conducted by the same lab or investigator, creating another potential source of dependence. An increasingly popular method to handle dependence is robust variance estimation (RVE), but this method can result in inflated Type I error rates when the number of studies is small. Small-sample correction methods for RVE have been shown to control Type I error rates adequately but may be overly conservative, especially for tests of multiple-contrast hypotheses. We evaluated an alternative method for handling dependence, cluster wild bootstrapping, which has been examined in the econometrics literature but not in the context of meta-analysis. Results from two simulation studies indicate that cluster wild bootstrapping maintains adequate Type I error rates and provides more power than extant small sample correction methods, particularly for multiple-contrast hypothesis tests. We recommend using cluster wild bootstrapping to conduct hypothesis tests for meta-analyses with a small number of studies. We have also created an R package that implements such tests.


Sign in / Sign up

Export Citation Format

Share Document