Maximum type I error rate inflation from sample size reassessment when investigators are blind to treatment labels

Magdalena Żebrowska; Martin Posch; Dominic Magirr

doi:10.1002/sim.6848

Alternative models and randomization techniques for Bayesian response-adaptive randomization with binary outcomes

Clinical Trials ◽

10.1177/17407745211010139 ◽

2021 ◽

pp. 174077452110101

Author(s):

Jennifer Proper ◽

John Connett ◽

Thomas Murray

Keyword(s):

Logistic Regression ◽

Sample Size ◽

Error Rate ◽

Adaptive Design ◽

Type I Error ◽

Probability Model ◽

Binary Outcomes ◽

Type I ◽

Operating Characteristics ◽

Type I Error Rate

Background: Bayesian response-adaptive designs, which data adaptively alter the allocation ratio in favor of the better performing treatment, are often criticized for engendering a non-trivial probability of a subject imbalance in favor of the inferior treatment, inflating type I error rate, and increasing sample size requirements. The implementation of these designs using the Thompson sampling methods has generally assumed a simple beta-binomial probability model in the literature; however, the effect of these choices on the resulting design operating characteristics relative to other reasonable alternatives has not been fully examined. Motivated by the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial, we posit that a logistic probability model coupled with an urn or permuted block randomization method will alleviate some of the practical limitations engendered by the conventional implementation of a two-arm Bayesian response-adaptive design with binary outcomes. In this article, we discuss up to what extent this solution works and when it does not. Methods: A computer simulation study was performed to evaluate the relative merits of a Bayesian response-adaptive design for the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial using the Thompson sampling methods based on a logistic regression probability model coupled with either an urn or permuted block randomization method that limits deviations from the evolving target allocation ratio. The different implementations of the response-adaptive design were evaluated for type I error rate control across various null response rates and power, among other performance metrics. Results: The logistic regression probability model engenders smaller average sample sizes with similar power, better control over type I error rate, and more favorable treatment arm sample size distributions than the conventional beta-binomial probability model, and designs using the alternative randomization methods have a negligible chance of a sample size imbalance in the wrong direction. Conclusion: Pairing the logistic regression probability model with either of the alternative randomization methods results in a much improved response-adaptive design in regard to important operating characteristics, including type I error rate control and the risk of a sample size imbalance in favor of the inferior treatment.

Download Full-text

Sample Size Reassessment in Non-inferiority Trials

Methods of Information in Medicine ◽

10.3414/me09-01-0063 ◽

2011 ◽

Vol 50 (03) ◽

pp. 237-243 ◽

Cited By ~ 10

Author(s):

T. Friede ◽

M. Kieser

Keyword(s):

Sample Size ◽

Error Rate ◽

Type I Error ◽

Nuisance Parameters ◽

Type I ◽

Nominal Significance Level ◽

Significance Level ◽

Type I Error Rate ◽

Planning Phase ◽

Nominal Significance

SummaryObjectives: Analysis of covariance (ANCOVA) is widely applied in practice and its use is recommended by regulatory guidelines. However, the required sample size for ANCOVA depends on parameters that are usually uncertain in the planning phase of a study. Sample size recalculation within the internal pilot study design allows to cope with this problem. From a regulatory viewpoint it is preferable that the treatment group allocation remains masked and that the type I error is controlled at the specified significance level. The characteristics of blinded sample size reassessment for ANCOVA in non-inferiority studies have not been investigated yet. We propose an appropriate method and evaluate its performance.Methods: In a simulation study, the characteristics of the proposed method with respect to type I error rate, power and sample size are investigated. It is illustrated by a clinical trial example how strict control of the significance level can be achieved.Results: A slight excess of the type I error rate beyond the nominal significance level was observed. The extent of exceedance increases with increasing non-inferiority margin and increasing correlation between outcome and covariate. The procedure assures the desired power over a wide range of scenarios even if nuisance parameters affecting the sample size are initially mis-specified.Conclusions: The proposed blinded sample size recalculation procedure protects from insufficient sample sizes due to incorrect assumptions about nuisance parameters in the planning phase. The original procedure may lead to an elevated type I error rate, but methods are available to control the nominal significance level.

Download Full-text

Assessment of Type I Error Rates and Power of Common Analysis Methods in Murine Obesity-Related Study: ‘Plasmode-Based’ Simulation (P13-011-19)

Current Developments in Nutrition ◽

10.1093/cdn/nzz036.p13-011-19 ◽

2019 ◽

Vol 3 (Supplement_1) ◽

Author(s):

Keisuke Ejima ◽

Andrew Brown ◽

Daniel Smith ◽

Ufuk Beyaztas ◽

David Allison

Keyword(s):

Sample Size ◽

Error Rate ◽

Type I Error ◽

Error Rates ◽

T Test ◽

Small Samples ◽

Type I ◽

Type I Error Rates ◽

Type I Error Rate ◽

Weight Distributions

Abstract Objectives Rigor, reproducibility and transparency (RRT) awareness has expanded over the last decade. Although RRT can be improved from various aspects, we focused on type I error rates and power of commonly used statistical analyses testing mean differences of two groups, using small (n ≤ 5) to moderate sample sizes. Methods We compared data from five distinct, homozygous, monogenic, murine models of obesity with non-mutant controls of both sexes. Baseline weight (7–11 weeks old) was the outcome. To examine whether type I error rate could be affected by choice of statistical tests, we adjusted the empirical distributions of weights to ensure the null hypothesis (i.e., no mean difference) in two ways: Case 1) center both weight distributions on the same mean weight; Case 2) combine data from control and mutant groups into one distribution. From these cases, 3 to 20 mice were resampled to create a ‘plasmode’ dataset. We performed five common tests (Student's t-test, Welch's t-test, Wilcoxon test, permutation test and bootstrap test) on the plasmodes and computed type I error rates. Power was assessed using plasmodes, where the distribution of the control group was shifted by adding a constant value as in Case 1, but to realize nominal effect sizes. Results Type I error rates were unreasonably higher than the nominal significance level (type I error rate inflation) for Student's t-test, Welch's t-test and permutation especially when sample size was small for Case 1, whereas inflation was observed only for permutation for Case 2. Deflation was noted for bootstrap with small sample. Increasing sample size mitigated inflation and deflation, except for Wilcoxon in Case 1 because heterogeneity of weight distributions between groups violated assumptions for the purposes of testing mean differences. For power, a departure from the reference value was observed with small samples. Compared with the other tests, bootstrap was underpowered with small samples as a tradeoff for maintaining type I error rates. Conclusions With small samples (n ≤ 5), bootstrap avoided type I error rate inflation, but often at the cost of lower power. To avoid type I error rate inflation for other tests, sample size should be increased. Wilcoxon should be avoided because of heterogeneity of weight distributions between mutant and control mice. Funding Sources This study was supported in part by NIH and Japan Society for Promotion of Science (JSPS) KAKENHI grant.

Download Full-text

Simple procedures for blinded sample size adjustment that do not affect the type I error rate

Statistics in Medicine ◽

10.1002/sim.1585 ◽

2003 ◽

Vol 22 (23) ◽

pp. 3571-3581 ◽

Cited By ~ 77

Author(s):

Meinhard Kieser ◽

Tim Friede

Keyword(s):

Sample Size ◽

Error Rate ◽

Type I Error ◽

Type I ◽

Type I Error Rate ◽

Size Adjustment ◽

Sample Size Adjustment

Download Full-text

Sample size re-assessment leading to a raised sample size does not inflate type I error rate under mild conditions

BMC Medical Research Methodology ◽

10.1186/1471-2288-13-94 ◽

2013 ◽

Vol 13 (1) ◽

Cited By ~ 4

Author(s):

Per Broberg

Keyword(s):

Sample Size ◽

Error Rate ◽

Type I Error ◽

Type I ◽

Mild Conditions ◽

Type I Error Rate

Download Full-text

Blinded Sample Size Reestimation with Negative Binomial Counts in Superiority and Non-inferiority Trials

Methods of Information in Medicine ◽

10.3414/me09-02-0060 ◽

2010 ◽

Vol 49 (06) ◽

pp. 618-624 ◽

Cited By ~ 28

Author(s):

H. Schmidli ◽

T. Friede

Keyword(s):

Sample Size ◽

Error Rate ◽

Negative Binomial ◽

Type I Error ◽

Event Rate ◽

Chronic Obstructive ◽

Type I ◽

Monte Carlo Simulation Study ◽

Type I Error Rate ◽

Overdispersion Parameter

Summary Background: In the planning of clinical trials with count outcomes such as the number of exacerbations in chronic obstructive pulmonary disease (COPD) often considerable uncertainty exists with regard to the overall event rate and the level of overdispersion which are both crucial for sample size calculations. Objectives: To develop a sample size reestimation strategy that maintains the blinding of the trial, controls the type I error rate and is robust against misspecification of the nuisance parameters in the planning phase in that the actual power is close to the target. Methods: The operation characteristics of the developed sample size reestimation procedure are investigated in a Monte Carlo simulation study. Results: Estimators of the overall event rate and the overdispersion parameter that do not require unblinding can be used to effectively adjust the sample size without inflating the type I error rate while providing power values close to the target. Conclusions: If only little information is available regarding the size of the overall event rate and the overdispersion parameter in the design phase of a trial, we recommend the use of a design with sample size reestimation as the one suggested here. Trials in COPD are expected to benefit from the proposed sample size reestimation strategy.

Download Full-text

Re-calculating the sample size in internal pilot study designs with control of the type I error rate

Statistics in Medicine ◽

10.1002/(sici)1097-0258(20000415)19:7<901::aid-sim405>3.0.co;2-l ◽

2000 ◽

Vol 19 (7) ◽

pp. 901-911 ◽

Cited By ~ 84

Author(s):

Meinhard Kieser ◽

Tim Friede

Keyword(s):

Pilot Study ◽

Sample Size ◽

Error Rate ◽

Type I Error ◽

Type I ◽

Internal Pilot ◽

Type I Error Rate ◽

Study Designs ◽

Internal Pilot Study

Download Full-text

Increasing the sample size during clinical trials witht-distributed test statistics without inflating the type I error rate

Statistics in Medicine ◽

10.1002/sim.2725 ◽

2007 ◽

Vol 26 (12) ◽

pp. 2449-2464 ◽

Cited By ~ 17

Author(s):

Nina Timmesfeld ◽

Helmut Schäfer ◽

Hans-Helge Müller

Keyword(s):

Clinical Trials ◽

Sample Size ◽

Error Rate ◽

Type I Error ◽

Type I ◽

Test Statistics ◽

Type I Error Rate ◽

Distributed Test

Download Full-text

Sample size influenced inflation of the type I error rate when collapsing 2×k contingency tables

Journal of Clinical Epidemiology ◽

10.1016/j.jclinepi.2008.09.016 ◽

2009 ◽

Vol 62 (3) ◽

pp. 353-354

Author(s):

Peter C. Austin

Keyword(s):

Sample Size ◽

Error Rate ◽

Type I Error ◽

Contingency Tables ◽

Type I ◽

Type I Error Rate

Download Full-text

Pairwise Multiple Comparison Procedures with Unequal N’s and/or Variances: A Monte Carlo Study

Journal of Educational Statistics ◽

10.3102/10769986001002113 ◽

1976 ◽

Vol 1 (2) ◽

pp. 113-125 ◽

Cited By ~ 85

Author(s):

Paul A. Games ◽

John F. Howell

Keyword(s):

Monte Carlo ◽

Sample Size ◽

Error Rate ◽

Type I Error ◽

Monte Carlo Study ◽

Multiple Comparison ◽

Type I ◽

Type I Errors ◽

Type I Error Rate ◽

Multiple Comparison Procedures

Three different methods for testing all pairs of yȳk, - yȳk’ were contrasted under varying sample size (n) and variance conditions. With unequal n’s of six and up, only the Behrens-Fisher statistic provided satisfactory control of both the familywise rate of Type I errors and Type I error rate on each contrast. Satisfactory control with unequal n’s of three and up is dubious even with this statistic.

Download Full-text