Improving power for rare variant tests by integrating external controls

Mapping Intimacies ◽

10.1101/081711 ◽

2016 ◽

Author(s):

Seunggeun Lee ◽

Sehee Kim ◽

Christian Fuchsberger

Keyword(s):

Rare Variant ◽

Type I Error ◽

Error Rates ◽

Age Related Macular Degeneration ◽

External Control ◽

Type I ◽

Batch Effects ◽

Age Related ◽

Sequencing Platforms ◽

Control Samples

AbstractDue to the drop in sequencing cost, the number of sequenced genomes is increasing rapidly. To improve power of rare variant tests, these sequenced samples could be used as external control samples in addition to control samples from the study itself. However, when using external controls, possible batch effects due to the use of different sequencing platforms or genotype calling pipelines can dramatically increase type I error rates. To address this, we propose novel summary statistics-based single and gene- or region-based rare-variant tests that allow the integration of external controls while controlling for type I error. Our approach is based on the insight that batch effects on a given variant can be assessed by comparing odds ratio estimates using internal controls only vs. using combined control samples of internal and external controls. From simulation experiments and the analysis of data from age related macular degeneration and type 2 diabetes studies, we demonstrate that our method can substantially improve power while controlling for type I error rate.

Download Full-text

The exhaustive genomic scan approach, with an application to rare-variant association analysis

10.1101/571752 ◽

2019 ◽

Author(s):

George Kanoungi ◽

Michael Nothnagel ◽

Tim Becker ◽

Dmitriy Drichel

Keyword(s):

Rare Variant ◽

Association Studies ◽

A Priori ◽

Error Rates ◽

Age Related Macular Degeneration ◽

Data Sets ◽

Rare Variant Association ◽

Age Related ◽

Genome Wide ◽

Definition Of

AbstractRegion-based genome-wide scans are usually performed by use of a priori chosen analysis regions. Such an approach will likely miss the region comprising the strongest signal and, thus, may result in increased type II error rates and decreased power. Here, we propose a genomic exhaustive scan approach that analyzes all possible subsequences and does not rely on a prior definition of the analysis regions. As a prime instance, we present a computationally ultra-efficient implementation using the rare-variant collapsing test for phenotypic association, the genomic exhaustive collapsing scan (GECS). Our implementation allows for the identification of regions comprising the strongest signals in large, genome-wide rare-variant association studies while controlling the family-wise error rate via permutation. Application of GECS to two genomic data sets revealed several novel significantly associated regions for age-related macular degeneration and for schizophrenia. Our approach also offers a high potential for genome-wide scans for selection, methylation and other analyses.

Download Full-text

Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole Genome Sequencing Studies

10.1101/552950 ◽

2019 ◽

Author(s):

Zilin Li ◽

Xihao Li ◽

Yaowu Liu ◽

Jincheng Shen ◽

Han Chen ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Rare Variant ◽

Type I Error ◽

Rare Variants ◽

Error Rates ◽

Type I ◽

Whole Genome ◽

Rare Variant Association ◽

Dynamic Scan

AbstractWhole genome sequencing (WGS) studies are being widely conducted to identify rare variants associated with human diseases and disease-related traits. Classical single-marker association analyses for rare variants have limited power, and variant-set based analyses are commonly used to analyze rare variants. However, existing variant-set based approaches need to pre-specify genetic regions for analysis, and hence are not directly applicable to WGS data due to the large number of intergenic and intron regions that consist of a massive number of non-coding variants. The commonly used sliding window method requires pre-specifying fixed window sizes, which are often unknown as a priori, are difficult to specify in practice and are subject to limitations given genetic association region sizes are likely to vary across the genome and phenotypes. We propose a computationally-efficient and dynamic scan statistic method (Scan the Genome (SCANG)) for analyzing WGS data that flexibly detects the sizes and the locations of rare-variants association regions without the need of specifying a prior fixed window size. The proposed method controls the genome-wise type I error rate and accounts for the linkage disequilibrium among genetic variants. It allows the detected rare variants association region sizes to vary across the genome. Through extensive simulated studies that consider a wide variety of scenarios, we show that SCANG substantially outperforms several alternative rare-variant association detection methods while controlling for the genome-wise type I error rates. We illustrate SCANG by analyzing the WGS lipids data from the Atherosclerosis Risk in Communities (ARIC) study.

Download Full-text

UK-Biobank Whole Exome Sequence Binary Phenome Analysis with Robust Region-based Rare Variant Test

10.1101/697912 ◽

2019 ◽

Cited By ~ 2

Author(s):

Zhangchen Zhao ◽

Wenjian Bi ◽

Wei Zhou ◽

Peter VandeHaar ◽

Lars G. Fritsche ◽

...

Keyword(s):

Data Analysis ◽

Rare Variant ◽

Type I Error ◽

Case Control ◽

Error Rates ◽

Type I ◽

Uk Biobank ◽

Type I Error Rates ◽

Whole Exome ◽

Exome Sequence

AbstractIn biobank data analysis, most binary phenotypes have unbalanced case-control ratios, which can cause inflation of type I error rates. Recently, a saddlepoint approximation (SPA) based single variant test has been developed to provide an accurate and scalable method to test for associations of such phenotypes. For gene- or region-based multiple variant tests, a few methods exist which adjust for unbalanced case-control ratios; however, these methods are either less accurate when case-control ratios are extremely unbalanced or not scalable for large data analyses. To address these problems, we propose SKAT/SKAT-O type region-based tests, where the single-variant score statistic is calibrated based on SPA and Efficient Resampling (ER). Through simulation studies, we show that the proposed method provides well-calibrated p-values. In contrast, the unadjusted approach has greatly inflated type I error rates (90 times of exome-wideα=2.5×10-6) when the case-control ratio is 1:99. Additionally, the proposed method has similar computation time as the unadjusted approaches and is scalable for large sample data. Our UK Biobank whole exome sequence data analysis of 45,596 unrelated European samples and 791 PheCode phenotypes identified 10 rare variant associations with p-value < 10-7, including the associations betweenJAK2and myeloproliferative disease,TNCand large cell lymphoma andF11and congenital coagulation defects. All analysis summary results are publicly available through a web-based visual server.

Download Full-text

Set-based rare variant association tests for biobank scale sequencing data sets

10.1101/2021.07.12.21260400 ◽

2021 ◽

Author(s):

Wei Zhou ◽

Wenjian Bi ◽

Zhangchen Zhao ◽

Kushal K. Dey ◽

Karthik A. Jagadeesh ◽

...

Keyword(s):

Rare Variant ◽

Error Control ◽

Type I Error ◽

Error Rates ◽

Data Sets ◽

Type I ◽

Sequencing Data ◽

Functional Annotations ◽

Whole Exome ◽

Inflated Type

UK Biobank has released the whole-exome sequencing (WES) data for 200,000 participants, but the best practices remain unclear for rare variant tests, and an existing approach, SAIGE-GENE, can have inflated type I error rates with high computation cost. Here, we propose SAIGE-GENE+ with greatly improved type I error control and computational efficiency compared to SAIGE-GENE. In the analysis of UKBB WES data of 30 quantitative and 141 binary traits, SAIGE-GENE+ identified 551 gene-phenotype associations. In addition, we showed that incorporating multiple MAF cutoffs and functional annotations can help identify novel gene-phenotype associations and SAIGE-GENE+ can facilitate this.

Download Full-text

Type I error rates and power of several versions of scaled chi-square difference tests in investigations of measurement invariance.

Psychological Methods ◽

10.1037/met0000097 ◽

2017 ◽

Vol 22 (3) ◽

pp. 467-485 ◽

Cited By ~ 4

Author(s):

Jordan Campbell Brace ◽

Victoria Savalei

Keyword(s):

Measurement Invariance ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Chi Square ◽

Type I Error Rates

Download Full-text

Correction: “Influence of Selection Bias on the Test Decision – A Simulation Study”

Methods of Information in Medicine ◽

10.3414/me11-01-0043e ◽

2014 ◽

Vol 53 (05) ◽

pp. 343-343

Keyword(s):

Selection Bias ◽

Simulation Study ◽

Error Rate ◽

Type I Error ◽

Block Size ◽

Error Rates ◽

Type I ◽

Type I Error Rate ◽

Representation Error ◽

Numeric Representation

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.

Download Full-text

The Use of Theory of Linear Mixed-Effects Models to Detect Fraudulent Erasures at an Aggregate Level

Educational and Psychological Measurement ◽

10.1177/0013164421994893 ◽

2021 ◽

pp. 001316442199489

Author(s):

Luyao Peng ◽

Sandip Sinharay

Keyword(s):

Type I Error ◽

Real Data ◽

Mixed Effects ◽

Error Rates ◽

Mixed Effects Models ◽

Type I ◽

Aggregate Level ◽

Linear Mixed Effects Models ◽

Linear Mixed Effects ◽

Best Linear Unbiased

Wollack et al. (2015) suggested the erasure detection index (EDI) for detecting fraudulent erasures for individual examinees. Wollack and Eckerly (2017) and Sinharay (2018) extended the index of Wollack et al. (2015) to suggest three EDIs for detecting fraudulent erasures at the aggregate or group level. This article follows up on the research of Wollack and Eckerly (2017) and Sinharay (2018) and suggests a new aggregate-level EDI by incorporating the empirical best linear unbiased predictor from the literature of linear mixed-effects models (e.g., McCulloch et al., 2008). A simulation study shows that the new EDI has larger power than the indices of Wollack and Eckerly (2017) and Sinharay (2018). In addition, the new index has satisfactory Type I error rates. A real data example is also included.

Download Full-text

Type I Error Rates, Coverage of Confidence Intervals, and Variance Estimation in Propensity-Score Matched Analyses

The International Journal of Biostatistics ◽

10.2202/1557-4679.1146 ◽

2009 ◽

Vol 5 (1) ◽

Cited By ~ 65

Author(s):

Peter C Austin

Keyword(s):

Propensity Score ◽

Confidence Intervals ◽

Variance Estimation ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Type I Error Rates

Download Full-text

The Robustness of the Likelihood Ratio Chi-Square Test for Structural Equation Models: A Meta-Analysis

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986026001105 ◽

2001 ◽

Vol 26 (1) ◽

pp. 105-132 ◽

Cited By ~ 30

Author(s):

Douglas A. Powell ◽

William D. Schafer

Keyword(s):

Structural Equation ◽

Structural Equation Models ◽

Type I Error ◽

Meta Analysis ◽

Generalized Least Squares ◽

Error Rates ◽

Type I ◽

Chi Square ◽

Distribution Free ◽

Projection Techniques

The robustness literature for the structural equation model was synthesized following the method of Harwell which employs meta-analysis as developed by Hedges and Vevea. The study focused on the explanation of empirical Type I error rates for six principal classes of estimators: two that assume multivariate normality (maximum likelihood and generalized least squares), elliptical estimators, two distribution-free estimators (asymptotic and others), and latent projection. Generally, the chi-square tests for overall model fit were found to be sensitive to non-normality and the size of the model for all estimators (with the possible exception of the elliptical estimators with respect to model size and the latent projection techniques with respect to non-normality). The asymptotic distribution-free (ADF) and latent projection techniques were also found to be sensitive to sample sizes. Distribution-free methods other than ADF showed, in general, much less sensitivity to all factors considered.

Download Full-text

Type I Error Rates for Parscale’s Fit Index

Educational and Psychological Measurement ◽

10.1177/0013164404264849 ◽

2005 ◽

Vol 65 (1) ◽

pp. 42-50 ◽

Cited By ~ 17

Author(s):

Christine E. Demars

Keyword(s):

Type I Error ◽

Error Rates ◽

Type I ◽

Type I Error Rates ◽

Fit Index

Download Full-text