Multiple testing and Type I errors: A reply in defense of multifactor designs.

1974 ◽  
Vol 29 (10) ◽  
pp. 778-779 ◽  
Author(s):  
H. J. Keselman
Author(s):  
Jinsong Chen ◽  
Mark J. van der Laan ◽  
Martyn T. Smith ◽  
Alan E. Hubbard

Microarray studies often need to simultaneously examine thousands of genes to determine which are differentially expressed. One main challenge in those studies is to find suitable multiple testing procedures that provide accurate control of the error rates of interest and meanwhile are most powerful, that is, they return the longest list of truly interesting genes among competitors. Many multiple testing methods have been developed recently for microarray data analysis, especially resampling based methods, such as permutation methods, the null-centered and scaled bootstrap (NCSB) method, and the quantile-transformed-bootstrap-distribution (QTBD) method. Each of these methods has its own merits and limitations. Theoretically permutation methods can fail to provide accurate control of Type I errors when the so-called subset pivotality condition is violated. The NCSB method does not suffer from that limitation, but an impractical number of bootstrap samples are often needed to get proper control of Type I errors. The newly developed QTBD method has the virtues of providing accurate control of Type I errors under few restrictions. However, the relative practical performance of the above three types of multiple testing methods remains unresolved. This paper compares the above three resampling based methods according to the control of family wise error rates (FWER) through data simulations. Results show that among the three resampling based methods, the QTBD method provides relatively accurate and powerful control in more general circumstances.


2004 ◽  
Vol 3 (1) ◽  
pp. 1-25 ◽  
Author(s):  
Mark J. van der Laan ◽  
Sandrine Dudoit ◽  
Katherine S. Pollard

This article shows that any single-step or stepwise multiple testing procedure (asymptotically) controlling the family-wise error rate (FWER) can be augmented into procedures that (asymptotically) control tail probabilities for the number of false positives and the proportion of false positives among the rejected hypotheses. Specifically, given any procedure that (asymptotically) controls the FWER at level alpha, we propose simple augmentation procedures that provide (asymptotic) level-alpha control of: (i) the generalized family-wise error rate, i.e., the tail probability, gFWER(k), that the number of Type I errors exceeds a user-supplied integer k, and (ii) the tail probability, TPPFP(q), that the proportion of Type I errors among the rejected hypotheses exceeds a user-supplied value 0


Author(s):  
Thomas Verron ◽  
Xavier Cahours ◽  
Stéphane Colard

SummaryDuring the last two decades, an increase of tobacco product reporting requirements from regulators was observed, such as Europe, Canada or USA.However, the capacity to compare and discriminate accurately two products is impacted by the number of constituents used for the comparison. Indeed, performing a large number of simultaneous independent hypothesis tests increases the probability of rejection of the null hypothesis when it should not be rejected. This leads to virtually guarantee the presence of type I errors among the findings. Correction methods have been developed to overcome this issue like the Bonferroni or Benjamini & Hochberg ones. The performance of these methods was assessed by comparing identical tobacco products with different sizes of data sets. Results showed that multiple comparisons lead to erroneous conclusions if the risk of type I error is not corrected. Unfortunately, reducing the type I error impacts the statistical power of the tests. Consequently, strategies for dealing with multiplicity of data should provide a reasonable balance between testing requirement and statistical power of differentiation. Multiple testing for product comparison is less of a problem if studies restrict to the most relevant parameters for comparison.


2015 ◽  
Vol 23 (2) ◽  
pp. 306-312 ◽  
Author(s):  
Annie Franco ◽  
Neil Malhotra ◽  
Gabor Simonovits

The accuracy of published findings is compromised when researchers fail to report and adjust for multiple testing. Preregistration of studies and the requirement of preanalysis plans for publication are two proposed solutions to combat this problem. Some have raised concerns that such changes in research practice may hinder inductive learning. However, without knowing the extent of underreporting, it is difficult to assess the costs and benefits of institutional reforms. This paper examines published survey experiments conducted as part of the Time-sharing Experiments in the Social Sciences program, where the questionnaires are made publicly available, allowing us to compare planned design features against what is reported in published research. We find that: (1) 30% of papers report fewer experimental conditions in the published paper than in the questionnaire; (2) roughly 60% of papers report fewer outcome variables than what are listed in the questionnaire; and (3) about 80% of papers fail to report all experimental conditions and outcomes. These findings suggest that published statistical tests understate the probability of type I errors.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 962 ◽  
Author(s):  
Judith ter Schure ◽  
Peter Grünwald

Studies accumulate over time and meta-analyses are mainly retrospective. These two characteristics introduce dependencies between the analysis time, at which a series of studies is up for meta-analysis, and results within the series. Dependencies introduce bias — Accumulation Bias — and invalidate the sampling distribution assumed for p-value tests, thus inflating type-I errors. But dependencies are also inevitable, since for science to accumulate efficiently, new research needs to be informed by past results. Here, we investigate various ways in which time influences error control in meta-analysis testing. We introduce an Accumulation Bias Framework that allows us to model a wide variety of practically occurring dependencies including study series accumulation, meta-analysis timing, and approaches to multiple testing in living systematic reviews. The strength of this framework is that it shows how all dependencies affect p-value-based tests in a similar manner. This leads to two main conclusions. First, Accumulation Bias is inevitable, and even if it can be approximated and accounted for, no valid p-value tests can be constructed. Second, tests based on likelihood ratios withstand Accumulation Bias: they provide bounds on error probabilities that remain valid despite the bias. We leave the reader with a choice between two proposals to consider time in error control: either treat individual (primary) studies and meta-analyses as two separate worlds — each with their own timing — or integrate individual studies in the meta-analysis world. Taking up likelihood ratios in either approach allows for valid tests that relate well to the accumulating nature of scientific knowledge. Likelihood ratios can be interpreted as betting profits, earned in previous studies and invested in new ones, while the meta-analyst is allowed to cash out at any time and advice against future studies.


Methodology ◽  
2015 ◽  
Vol 11 (3) ◽  
pp. 110-115 ◽  
Author(s):  
Rand R. Wilcox ◽  
Jinxia Ma

Abstract. The paper compares methods that allow both within group and between group heteroscedasticity when performing all pairwise comparisons of the least squares lines associated with J independent groups. The methods are based on simple extension of results derived by Johansen (1980) and Welch (1938) in conjunction with the HC3 and HC4 estimators. The probability of one or more Type I errors is controlled using the improvement on the Bonferroni method derived by Hochberg (1988) . Results are illustrated using data from the Well Elderly 2 study, which motivated this paper.


2020 ◽  
Vol 39 (3) ◽  
pp. 185-208
Author(s):  
Qiao Xu ◽  
Rachana Kalelkar

SUMMARY This paper examines whether inaccurate going-concern opinions negatively affect the audit office's reputation. Assuming that clients perceive the incidence of going-concern opinion errors as a systematic audit quality concern within the entire audit office, we expect these inaccuracies to impact the audit office market share and dismissal rate. We find that going-concern opinion inaccuracy is negatively associated with the audit office market share and is positively associated with the audit office dismissal rate. Furthermore, we find that the decline in market share and the increase in dismissal rate are primarily associated with Type I errors. Additional analyses reveal that the negative consequence of going-concern opinion inaccuracy is lower for Big 4 audit offices. Finally, we find that the decrease in the audit office market share is explained by the distressed clients' reactions to Type I errors and audit offices' lack of ability to attract new clients.


Biostatistics ◽  
2017 ◽  
Vol 18 (3) ◽  
pp. 477-494 ◽  
Author(s):  
Jakub Pecanka ◽  
Marianne A. Jonker ◽  
Zoltan Bochdanovits ◽  
Aad W. Van Der Vaart ◽  

Summary For over a decade functional gene-to-gene interaction (epistasis) has been suspected to be a determinant in the “missing heritability” of complex traits. However, searching for epistasis on the genome-wide scale has been challenging due to the prohibitively large number of tests which result in a serious loss of statistical power as well as computational challenges. In this article, we propose a two-stage method applicable to existing case-control data sets, which aims to lessen both of these problems by pre-assessing whether a candidate pair of genetic loci is involved in epistasis before it is actually tested for interaction with respect to a complex phenotype. The pre-assessment is based on a two-locus genotype independence test performed in the sample of cases. Only the pairs of loci that exhibit non-equilibrium frequencies are analyzed via a logistic regression score test, thereby reducing the multiple testing burden. Since only the computationally simple independence tests are performed for all pairs of loci while the more demanding score tests are restricted to the most promising pairs, genome-wide association study (GWAS) for epistasis becomes feasible. By design our method provides strong control of the type I error. Its favourable power properties especially under the practically relevant misspecification of the interaction model are illustrated. Ready-to-use software is available. Using the method we analyzed Parkinson’s disease in four cohorts and identified possible interactions within several SNP pairs in multiple cohorts.


2016 ◽  
Vol 27 (5) ◽  
pp. 1513-1530 ◽  
Author(s):  
Dena R Howard ◽  
Julia M Brown ◽  
Susan Todd ◽  
Walter M Gregory

Multi-arm clinical trials assessing multiple experimental treatments against a shared control group can offer efficiency advantages over independent trials through assessing an increased number of hypotheses. Published opinion is divided on the requirement for multiple testing adjustment to control the family-wise type-I error rate (FWER). The probability of a false positive error in multi-arm trials compared to equivalent independent trials is affected by the correlation between comparisons due to sharing control data. We demonstrate that this correlation in fact leads to a reduction in the FWER, therefore FWER adjustment is not recommended solely due to sharing control data. In contrast, the correlation increases the probability of multiple false positive outcomes across the hypotheses, although standard FWER adjustment methods do not control for this. A stringent critical value adjustment is proposed to maintain equivalent evidence of superiority in two correlated comparisons to that obtained within independent trials. FWER adjustment is only required if there is an increased chance of making a single claim of effectiveness by testing multiple hypotheses, not due to sharing control data. For competing experimental therapies, the correlation between comparisons can be advantageous as it eliminates bias due to the experimental therapies being compared to different control populations.


2018 ◽  
Vol 7 (10) ◽  
pp. 409 ◽  
Author(s):  
Youqiang Dong ◽  
Ximin Cui ◽  
Li Zhang ◽  
Haibin Ai

The progressive TIN (triangular irregular network) densification (PTD) filter algorithm is widely used for filtering point clouds. In the PTD algorithm, the iterative densification parameters become smaller over the entire process of filtering. This leads to the performance—especially the type I errors of the PTD algorithm—being poor for point clouds with high density and standard variance. Hence, an improved PTD filtering algorithm for point clouds with high density and variance is proposed in this paper. This improved PTD method divides the iterative densification process into two stages. In the first stage, the iterative densification process of the PTD algorithm is used, and the two densification parameters become smaller. When the density of points belonging to the TIN is higher than a certain value (in this paper, we define this density as the standard variance intervention density), the iterative densification process moves into the second stage. In the second stage, a new iterative densification strategy based on multi-scales is proposed, and the angle threshold becomes larger. The experimental results show that the improved PTD algorithm can effectively reduce the type I errors and total errors of the DIM point clouds by 7.53% and 4.09%, respectively, compared with the PTD algorithm. Although the type II errors increase slightly in our improved method, the wrongly added objective points have little effect on the accuracy of the generated DSM. In short, our improved PTD method perfects the classical PTD method and offers a better solution for filtering point clouds with high density and standard variance.


Sign in / Sign up

Export Citation Format

Share Document