Multiple testing and Type I errors: A reply in defense of multifactor designs.

H. J. Keselman

doi:10.1037/h0038177

A Comparison of Methods to Control Type I Errors in Microarray Studies

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1310 ◽

2007 ◽

Vol 6 (1) ◽

Cited By ~ 6

Author(s):

Jinsong Chen ◽

Mark J. van der Laan ◽

Martyn T. Smith ◽

Alan E. Hubbard

Keyword(s):

Multiple Testing ◽

Error Rates ◽

Microarray Data Analysis ◽

Type I ◽

Testing Methods ◽

Type I Errors ◽

Testing Procedures ◽

Main Challenge ◽

Permutation Methods ◽

Microarray Studies

Microarray studies often need to simultaneously examine thousands of genes to determine which are differentially expressed. One main challenge in those studies is to find suitable multiple testing procedures that provide accurate control of the error rates of interest and meanwhile are most powerful, that is, they return the longest list of truly interesting genes among competitors. Many multiple testing methods have been developed recently for microarray data analysis, especially resampling based methods, such as permutation methods, the null-centered and scaled bootstrap (NCSB) method, and the quantile-transformed-bootstrap-distribution (QTBD) method. Each of these methods has its own merits and limitations. Theoretically permutation methods can fail to provide accurate control of Type I errors when the so-called subset pivotality condition is violated. The NCSB method does not suffer from that limitation, but an impractical number of bootstrap samples are often needed to get proper control of Type I errors. The newly developed QTBD method has the virtues of providing accurate control of Type I errors under few restrictions. However, the relative practical performance of the above three types of multiple testing methods remains unresolved. This paper compares the above three resampling based methods according to the control of family wise error rates (FWER) through data simulations. Results show that among the three resampling based methods, the QTBD method provides relatively accurate and powerful control in more general circumstances.

Download Full-text

Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1042 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-25 ◽

Cited By ~ 38

Author(s):

Mark J. van der Laan ◽

Sandrine Dudoit ◽

Katherine S. Pollard

Keyword(s):

Error Rate ◽

Multiple Testing ◽

Tail Probability ◽

False Positives ◽

Single Step ◽

Type I ◽

Tail Probabilities ◽

Multiple Testing Procedure ◽

Type I Errors ◽

Family Wise Error Rate

This article shows that any single-step or stepwise multiple testing procedure (asymptotically) controlling the family-wise error rate (FWER) can be augmented into procedures that (asymptotically) control tail probabilities for the number of false positives and the proportion of false positives among the rejected hypotheses. Specifically, given any procedure that (asymptotically) controls the FWER at level alpha, we propose simple augmentation procedures that provide (asymptotic) level-alpha control of: (i) the generalized family-wise error rate, i.e., the tail probability, gFWER(k), that the number of Type I errors exceeds a user-supplied integer k, and (ii) the tail probability, TPPFP(q), that the proportion of Type I errors among the rejected hypotheses exceeds a user-supplied value 0

Download Full-text

Analysis of the Effect of Multiple Testing in Assessing Tobacco Product Differences

Beiträge zur Tabakforschung International/Contributions to Tobacco Research ◽

10.1515/cttr-2017-0008 ◽

2017 ◽

Vol 27 (6) ◽

pp. 78-85

Author(s):

Thomas Verron ◽

Xavier Cahours ◽

Stéphane Colard

Keyword(s):

Multiple Testing ◽

Statistical Power ◽

Type I Error ◽

Multiple Comparisons ◽

Tobacco Product ◽

Data Sets ◽

Type I ◽

Type I Errors ◽

Reporting Requirements ◽

Product Comparison

SummaryDuring the last two decades, an increase of tobacco product reporting requirements from regulators was observed, such as Europe, Canada or USA.However, the capacity to compare and discriminate accurately two products is impacted by the number of constituents used for the comparison. Indeed, performing a large number of simultaneous independent hypothesis tests increases the probability of rejection of the null hypothesis when it should not be rejected. This leads to virtually guarantee the presence of type I errors among the findings. Correction methods have been developed to overcome this issue like the Bonferroni or Benjamini & Hochberg ones. The performance of these methods was assessed by comparing identical tobacco products with different sizes of data sets. Results showed that multiple comparisons lead to erroneous conclusions if the risk of type I error is not corrected. Unfortunately, reducing the type I error impacts the statistical power of the tests. Consequently, strategies for dealing with multiplicity of data should provide a reasonable balance between testing requirement and statistical power of differentiation. Multiple testing for product comparison is less of a problem if studies restrict to the most relevant parameters for comparison.

Download Full-text

Underreporting in Political Science Survey Experiments: Comparing Questionnaires to Published Results

Political Analysis ◽

10.1093/pan/mpv006 ◽

2015 ◽

Vol 23 (2) ◽

pp. 306-312 ◽

Cited By ~ 16

Author(s):

Annie Franco ◽

Neil Malhotra ◽

Gabor Simonovits

Keyword(s):

Multiple Testing ◽

Statistical Tests ◽

Inductive Learning ◽

Type I ◽

Time Sharing ◽

Institutional Reforms ◽

Type I Errors ◽

Experimental Conditions ◽

The Social ◽

Survey Experiments

The accuracy of published findings is compromised when researchers fail to report and adjust for multiple testing. Preregistration of studies and the requirement of preanalysis plans for publication are two proposed solutions to combat this problem. Some have raised concerns that such changes in research practice may hinder inductive learning. However, without knowing the extent of underreporting, it is difficult to assess the costs and benefits of institutional reforms. This paper examines published survey experiments conducted as part of the Time-sharing Experiments in the Social Sciences program, where the questionnaires are made publicly available, allowing us to compare planned design features against what is reported in published research. We find that: (1) 30% of papers report fewer experimental conditions in the published paper than in the questionnaire; (2) roughly 60% of papers report fewer outcome variables than what are listed in the questionnaire; and (3) about 80% of papers fail to report all experimental conditions and outcomes. These findings suggest that published statistical tests understate the probability of type I errors.

Download Full-text

Accumulation Bias in meta-analysis: the need to consider time in error control

F1000Research ◽

10.12688/f1000research.19375.1 ◽

2019 ◽

Vol 8 ◽

pp. 962 ◽

Cited By ~ 1

Author(s):

Judith ter Schure ◽

Peter Grünwald

Keyword(s):

Multiple Testing ◽

Error Control ◽

Meta Analysis ◽

P Value ◽

Type I ◽

Likelihood Ratios ◽

Type I Errors ◽

Error Probabilities ◽

New Research ◽

Meta Analyses

Studies accumulate over time and meta-analyses are mainly retrospective. These two characteristics introduce dependencies between the analysis time, at which a series of studies is up for meta-analysis, and results within the series. Dependencies introduce bias — Accumulation Bias — and invalidate the sampling distribution assumed for p-value tests, thus inflating type-I errors. But dependencies are also inevitable, since for science to accumulate efficiently, new research needs to be informed by past results. Here, we investigate various ways in which time influences error control in meta-analysis testing. We introduce an Accumulation Bias Framework that allows us to model a wide variety of practically occurring dependencies including study series accumulation, meta-analysis timing, and approaches to multiple testing in living systematic reviews. The strength of this framework is that it shows how all dependencies affect p-value-based tests in a similar manner. This leads to two main conclusions. First, Accumulation Bias is inevitable, and even if it can be approximated and accounted for, no valid p-value tests can be constructed. Second, tests based on likelihood ratios withstand Accumulation Bias: they provide bounds on error probabilities that remain valid despite the bias. We leave the reader with a choice between two proposals to consider time in error control: either treat individual (primary) studies and meta-analyses as two separate worlds — each with their own timing — or integrate individual studies in the meta-analysis world. Taking up likelihood ratios in either approach allows for valid tests that relate well to the accumulating nature of scientific knowledge. Likelihood ratios can be interpreted as betting profits, earned in previous studies and invested in new ones, while the meta-analyst is allowed to cash out at any time and advice against future studies.

Download Full-text

Heteroscedastic Methods for Performing All Pairwise Comparisons of Regression Lines Associated With J Independent Groups

Methodology ◽

10.1027/1614-2241/a000097 ◽

2015 ◽

Vol 11 (3) ◽

pp. 110-115 ◽

Cited By ~ 2

Author(s):

Rand R. Wilcox ◽

Jinxia Ma

Keyword(s):

Least Squares ◽

Pairwise Comparisons ◽

Simple Extension ◽

Type I ◽

Type I Errors ◽

Well Elderly ◽

Using Data

Abstract. The paper compares methods that allow both within group and between group heteroscedasticity when performing all pairwise comparisons of the least squares lines associated with J independent groups. The methods are based on simple extension of results derived by Johansen (1980) and Welch (1938) in conjunction with the HC3 and HC4 estimators. The probability of one or more Type I errors is controlled using the improvement on the Bonferroni method derived by Hochberg (1988) . Results are illustrated using data from the Well Elderly 2 study, which motivated this paper.

Download Full-text

Consequences of Going-Concern Opinion Inaccuracy at the Audit Office Level

Auditing A Journal of Practice & Theory ◽

10.2308/ajpt-18-050 ◽

2020 ◽

Vol 39 (3) ◽

pp. 185-208

Author(s):

Qiao Xu ◽

Rachana Kalelkar

Keyword(s):

Market Share ◽

Audit Quality ◽

Type I ◽

Type I Errors ◽

Negatively Associated ◽

Going Concern ◽

Big 4 ◽

Office Market ◽

Going Concern Opinion ◽

Office Level

SUMMARY This paper examines whether inaccurate going-concern opinions negatively affect the audit office's reputation. Assuming that clients perceive the incidence of going-concern opinion errors as a systematic audit quality concern within the entire audit office, we expect these inaccuracies to impact the audit office market share and dismissal rate. We find that going-concern opinion inaccuracy is negatively associated with the audit office market share and is positively associated with the audit office dismissal rate. Furthermore, we find that the decline in market share and the increase in dismissal rate are primarily associated with Type I errors. Additional analyses reveal that the negative consequence of going-concern opinion inaccuracy is lower for Big 4 audit offices. Finally, we find that the decrease in the audit office market share is explained by the distressed clients' reactions to Type I errors and audit offices' lack of ability to attract new clients.

Download Full-text

A powerful and efficient two-stage method for detecting gene-to-gene interactions in GWAS

Biostatistics ◽

10.1093/biostatistics/kxw060 ◽

2017 ◽

Vol 18 (3) ◽

pp. 477-494 ◽

Cited By ~ 5

Author(s):

Jakub Pecanka ◽

Marianne A. Jonker ◽

Zoltan Bochdanovits ◽

Aad W. Van Der Vaart ◽

Keyword(s):

Complex Traits ◽

Multiple Testing ◽

Statistical Power ◽

Genome Wide Association Study ◽

Score Test ◽

Interaction Model ◽

Type I ◽

Two Stage ◽

Genome Wide ◽

Strong Control

Summary For over a decade functional gene-to-gene interaction (epistasis) has been suspected to be a determinant in the “missing heritability” of complex traits. However, searching for epistasis on the genome-wide scale has been challenging due to the prohibitively large number of tests which result in a serious loss of statistical power as well as computational challenges. In this article, we propose a two-stage method applicable to existing case-control data sets, which aims to lessen both of these problems by pre-assessing whether a candidate pair of genetic loci is involved in epistasis before it is actually tested for interaction with respect to a complex phenotype. The pre-assessment is based on a two-locus genotype independence test performed in the sample of cases. Only the pairs of loci that exhibit non-equilibrium frequencies are analyzed via a logistic regression score test, thereby reducing the multiple testing burden. Since only the computationally simple independence tests are performed for all pairs of loci while the more demanding score tests are restricted to the most promising pairs, genome-wide association study (GWAS) for epistasis becomes feasible. By design our method provides strong control of the type I error. Its favourable power properties especially under the practically relevant misspecification of the interaction model are illustrated. Ready-to-use software is available. Using the method we analyzed Parkinson’s disease in four cohorts and identified possible interactions within several SNP pairs in multiple cohorts.

Download Full-text

Recommendations on multiple testing adjustment in multi-arm trials with a shared control group

Statistical Methods in Medical Research ◽

10.1177/0962280216664759 ◽

2016 ◽

Vol 27 (5) ◽

pp. 1513-1530 ◽

Cited By ~ 18

Author(s):

Dena R Howard ◽

Julia M Brown ◽

Susan Todd ◽

Walter M Gregory

Keyword(s):

False Positive ◽

Multiple Testing ◽

Shared Control ◽

Control Group ◽

Type I ◽

False Positive Error ◽

Control Data ◽

Experimental Therapies ◽

Multiple Testing Adjustment ◽

Independent Trials

Multi-arm clinical trials assessing multiple experimental treatments against a shared control group can offer efficiency advantages over independent trials through assessing an increased number of hypotheses. Published opinion is divided on the requirement for multiple testing adjustment to control the family-wise type-I error rate (FWER). The probability of a false positive error in multi-arm trials compared to equivalent independent trials is affected by the correlation between comparisons due to sharing control data. We demonstrate that this correlation in fact leads to a reduction in the FWER, therefore FWER adjustment is not recommended solely due to sharing control data. In contrast, the correlation increases the probability of multiple false positive outcomes across the hypotheses, although standard FWER adjustment methods do not control for this. A stringent critical value adjustment is proposed to maintain equivalent evidence of superiority in two correlated comparisons to that obtained within independent trials. FWER adjustment is only required if there is an increased chance of making a single claim of effectiveness by testing multiple hypotheses, not due to sharing control data. For competing experimental therapies, the correlation between comparisons can be advantageous as it eliminates bias due to the experimental therapies being compared to different control populations.

Download Full-text

An Improved Progressive TIN Densification Filtering Method Considering the Density and Standard Variance of Point Clouds

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7100409 ◽

2018 ◽

Vol 7 (10) ◽

pp. 409 ◽

Cited By ~ 2

Author(s):

Youqiang Dong ◽

Ximin Cui ◽

Li Zhang ◽

Haibin Ai

Keyword(s):

Point Clouds ◽

High Density ◽

Type I ◽

Densification Process ◽

Type I Errors ◽

Second Stage ◽

Standard Variance ◽

Triangular Irregular Network ◽

Two Stages ◽

Type Ii Errors

The progressive TIN (triangular irregular network) densification (PTD) filter algorithm is widely used for filtering point clouds. In the PTD algorithm, the iterative densification parameters become smaller over the entire process of filtering. This leads to the performance—especially the type I errors of the PTD algorithm—being poor for point clouds with high density and standard variance. Hence, an improved PTD filtering algorithm for point clouds with high density and variance is proposed in this paper. This improved PTD method divides the iterative densification process into two stages. In the first stage, the iterative densification process of the PTD algorithm is used, and the two densification parameters become smaller. When the density of points belonging to the TIN is higher than a certain value (in this paper, we define this density as the standard variance intervention density), the iterative densification process moves into the second stage. In the second stage, a new iterative densification strategy based on multi-scales is proposed, and the angle threshold becomes larger. The experimental results show that the improved PTD algorithm can effectively reduce the type I errors and total errors of the DIM point clouds by 7.53% and 4.09%, respectively, compared with the PTD algorithm. Although the type II errors increase slightly in our improved method, the wrongly added objective points have little effect on the accuracy of the generated DSM. In short, our improved PTD method perfects the classical PTD method and offers a better solution for filtering point clouds with high density and standard variance.

Download Full-text