scholarly journals Maximum type I error rate inflation from sample size reassessment when investigators are blind to treatment labels

2015 ◽  
Vol 35 (12) ◽  
pp. 1972-1984 ◽  
Author(s):  
Magdalena Żebrowska ◽  
Martin Posch ◽  
Dominic Magirr
2021 ◽  
pp. 174077452110101
Author(s):  
Jennifer Proper ◽  
John Connett ◽  
Thomas Murray

Background: Bayesian response-adaptive designs, which data adaptively alter the allocation ratio in favor of the better performing treatment, are often criticized for engendering a non-trivial probability of a subject imbalance in favor of the inferior treatment, inflating type I error rate, and increasing sample size requirements. The implementation of these designs using the Thompson sampling methods has generally assumed a simple beta-binomial probability model in the literature; however, the effect of these choices on the resulting design operating characteristics relative to other reasonable alternatives has not been fully examined. Motivated by the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial, we posit that a logistic probability model coupled with an urn or permuted block randomization method will alleviate some of the practical limitations engendered by the conventional implementation of a two-arm Bayesian response-adaptive design with binary outcomes. In this article, we discuss up to what extent this solution works and when it does not. Methods: A computer simulation study was performed to evaluate the relative merits of a Bayesian response-adaptive design for the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial using the Thompson sampling methods based on a logistic regression probability model coupled with either an urn or permuted block randomization method that limits deviations from the evolving target allocation ratio. The different implementations of the response-adaptive design were evaluated for type I error rate control across various null response rates and power, among other performance metrics. Results: The logistic regression probability model engenders smaller average sample sizes with similar power, better control over type I error rate, and more favorable treatment arm sample size distributions than the conventional beta-binomial probability model, and designs using the alternative randomization methods have a negligible chance of a sample size imbalance in the wrong direction. Conclusion: Pairing the logistic regression probability model with either of the alternative randomization methods results in a much improved response-adaptive design in regard to important operating characteristics, including type I error rate control and the risk of a sample size imbalance in favor of the inferior treatment.


2011 ◽  
Vol 50 (03) ◽  
pp. 237-243 ◽  
Author(s):  
T. Friede ◽  
M. Kieser

SummaryObjectives: Analysis of covariance (ANCOVA) is widely applied in practice and its use is recommended by regulatory guidelines. However, the required sample size for ANCOVA depends on parameters that are usually uncertain in the planning phase of a study. Sample size recalculation within the internal pilot study design allows to cope with this problem. From a regulatory viewpoint it is preferable that the treatment group allocation remains masked and that the type I error is controlled at the specified significance level. The characteristics of blinded sample size reassessment for ANCOVA in non-inferiority studies have not been investigated yet. We propose an appropriate method and evaluate its performance.Methods: In a simulation study, the characteristics of the proposed method with respect to type I error rate, power and sample size are investigated. It is illustrated by a clinical trial example how strict control of the significance level can be achieved.Results: A slight excess of the type I error rate beyond the nominal significance level was observed. The extent of exceedance increases with increasing non-inferiority margin and increasing correlation between outcome and covariate. The procedure assures the desired power over a wide range of scenarios even if nuisance parameters affecting the sample size are initially mis-specified.Conclusions: The proposed blinded sample size recalculation procedure protects from insufficient sample sizes due to incorrect assumptions about nuisance parameters in the planning phase. The original procedure may lead to an elevated type I error rate, but methods are available to control the nominal significance level.


2019 ◽  
Vol 3 (Supplement_1) ◽  
Author(s):  
Keisuke Ejima ◽  
Andrew Brown ◽  
Daniel Smith ◽  
Ufuk Beyaztas ◽  
David Allison

Abstract Objectives Rigor, reproducibility and transparency (RRT) awareness has expanded over the last decade. Although RRT can be improved from various aspects, we focused on type I error rates and power of commonly used statistical analyses testing mean differences of two groups, using small (n ≤ 5) to moderate sample sizes. Methods We compared data from five distinct, homozygous, monogenic, murine models of obesity with non-mutant controls of both sexes. Baseline weight (7–11 weeks old) was the outcome. To examine whether type I error rate could be affected by choice of statistical tests, we adjusted the empirical distributions of weights to ensure the null hypothesis (i.e., no mean difference) in two ways: Case 1) center both weight distributions on the same mean weight; Case 2) combine data from control and mutant groups into one distribution. From these cases, 3 to 20 mice were resampled to create a ‘plasmode’ dataset. We performed five common tests (Student's t-test, Welch's t-test, Wilcoxon test, permutation test and bootstrap test) on the plasmodes and computed type I error rates. Power was assessed using plasmodes, where the distribution of the control group was shifted by adding a constant value as in Case 1, but to realize nominal effect sizes. Results Type I error rates were unreasonably higher than the nominal significance level (type I error rate inflation) for Student's t-test, Welch's t-test and permutation especially when sample size was small for Case 1, whereas inflation was observed only for permutation for Case 2. Deflation was noted for bootstrap with small sample. Increasing sample size mitigated inflation and deflation, except for Wilcoxon in Case 1 because heterogeneity of weight distributions between groups violated assumptions for the purposes of testing mean differences. For power, a departure from the reference value was observed with small samples. Compared with the other tests, bootstrap was underpowered with small samples as a tradeoff for maintaining type I error rates. Conclusions With small samples (n ≤ 5), bootstrap avoided type I error rate inflation, but often at the cost of lower power. To avoid type I error rate inflation for other tests, sample size should be increased. Wilcoxon should be avoided because of heterogeneity of weight distributions between mutant and control mice. Funding Sources This study was supported in part by NIH and Japan Society for Promotion of Science (JSPS) KAKENHI grant.


2010 ◽  
Vol 49 (06) ◽  
pp. 618-624 ◽  
Author(s):  
H. Schmidli ◽  
T. Friede

Summary Background: In the planning of clinical trials with count outcomes such as the number of exacerbations in chronic obstructive pulmonary disease (COPD) often considerable uncertainty exists with regard to the overall event rate and the level of overdispersion which are both crucial for sample size calculations. Objectives: To develop a sample size reestimation strategy that maintains the blinding of the trial, controls the type I error rate and is robust against misspecification of the nuisance parameters in the planning phase in that the actual power is close to the target. Methods: The operation characteristics of the developed sample size reestimation procedure are investigated in a Monte Carlo simulation study. Results: Estimators of the overall event rate and the overdispersion parameter that do not require unblinding can be used to effectively adjust the sample size without inflating the type I error rate while providing power values close to the target. Conclusions: If only little information is available regarding the size of the overall event rate and the overdispersion parameter in the design phase of a trial, we recommend the use of a design with sample size reestimation as the one suggested here. Trials in COPD are expected to benefit from the proposed sample size reestimation strategy.


1976 ◽  
Vol 1 (2) ◽  
pp. 113-125 ◽  
Author(s):  
Paul A. Games ◽  
John F. Howell

Three different methods for testing all pairs of yȳk, - yȳk’ were contrasted under varying sample size (n) and variance conditions. With unequal n’s of six and up, only the Behrens-Fisher statistic provided satisfactory control of both the familywise rate of Type I errors and Type I error rate on each contrast. Satisfactory control with unequal n’s of three and up is dubious even with this statistic.


Sign in / Sign up

Export Citation Format

Share Document