scholarly journals Maximum type 1 error rate inflation in multiarmed clinical trials with adaptive interim sample size modifications

2014 ◽  
Vol 56 (4) ◽  
pp. 614-630 ◽  
Author(s):  
Alexandra C. Graf ◽  
Peter Bauer ◽  
Ekkehard Glimm ◽  
Franz Koenig
1986 ◽  
Vol 20 (2) ◽  
pp. 189-200 ◽  
Author(s):  
Kevin D. Bird ◽  
Wayne Hall

Statistical power is neglected in much psychiatric research, with the consequence that many studies do not provide a reasonable chance of detecting differences between groups if they exist in the population. This paper attempts to improve current practice by providing an introduction to the essential quantities required for performing a power analysis (sample size, effect size, type 1 and type 2 error rates). We provide simplified tables for estimating the sample size required to detect a specified size of effect with a type 1 error rate of α and a type 2 error rate of β, and for estimating the power provided by a given sample size for detecting a specified size of effect with a type 1 error rate of α. We show how to modify these tables to perform power analyses for multiple comparisons in univariate and some multivariate designs. Power analyses for each of these types of design are illustrated by examples.


2007 ◽  
Vol 25 (18_suppl) ◽  
pp. 5130-5130
Author(s):  
G. A. Gignac ◽  
M. J. Morris ◽  
G. Heller ◽  
H. I. Scher

5130 Background: PFS has been proposed as an endpoint in prostate cancer because tumor regression cannot be assessed easily and the significance of post-therapy changes in PSA is uncertain. There is significant variability in the frequency by which outcomes are assessed across clinical trials. We sought to create a model that would define the degree of error in estimating PFS from this variability. Methods: A simulation experiment was performed. An exponential distribution was used to generate 100 progression times for 3 hypothetical risk cohorts: rapid (median PFS of 18 wks), intermediate (36 wks) and slow progressors (72 wks). We examined how reported PFS would change depending on 3 assessment schedules: every (q) 6, 8 and 12 wks for 48 wks each, then q6 months for 2 years. The logrank statistic was used to compare PFS between schedules. If different schedules have no impact the expected type 1 error rate should be 5% (where a difference in PFS time is detected when none existed). Each simulation was repeated 1000 times. Results: Nine pairwise comparisons were performed and Kaplan-Meier PFS estimates created for the 3 assessment schedules and 3 risk cohorts (see table 1 ). In the highest risk cohort, PFS of 18 wks, 38% of the time the logrank test showed a falsely prolonged PFS for pts assessed on the q12 vs 6 wks schedule. This type 1 error rate (by simulation) was reduced to 20% and 11%, when the schedules were q8 vs 12 wks and q6 vs 8 wks, respectively, but remained above the 5% expected rate for type 1 error. For lower risk pts, PFS of 72 wks, the disparity in PFS times was diminished. Conclusions: Progression free survival is significantly skewed by the schedule of assessing treatment effects in clinical trials. This argues for uniformity in the timing of outcome assessments across trials and between arms in randomized trials. Grant support: 5T32CA09207 [Table: see text] No significant financial relationships to disclose.


2016 ◽  
Vol 148 (8) ◽  
pp. 24-31
Author(s):  
Kayode Ayinde ◽  
John Olatunde ◽  
Gbenga Sunday

2019 ◽  
Vol 16 (6) ◽  
pp. 673-681 ◽  
Author(s):  
Edward L Korn ◽  
Robert J Gray ◽  
Boris Freidlin

Background: Nonadherence to treatment assignment in a noninferiority randomized trial is especially problematic because it attenuates observed differences between the treatment arms, possibly leading one to conclude erroneously that a truly inferior experimental therapy is noninferior to a standard therapy (inflated type 1 error probability). The Lachin–Foulkes adjustment is an increase in the sample size to account for random nonadherence for the design of a superiority trial with a time-to-event outcome; it has not been explored in the noninferiority trial setting nor with nonrandom nonadherence. Noninferiority trials where patients have knowledge of a personal prognostic risk score may lead to nonrandom nonadherence, as patients with a relatively high risk may be more likely to not adhere to the random assignment to the (reduced) experimental therapy, and patients with a relatively low risk score may be more likely to not adhere to the random assignment to the (more aggressive) standard therapy. Methods: We investigated via simulations the properties of the Lachin–Foulkes adjustment in the noninferiority setting. We considered nonrandom in addition to random nonadherence to the treatment assignment. For nonrandom nonadherence, we used the scenario where a risk score, potentially associated with the between-arm treatment difference, influences patients’ nonadherence. A sensitivity analysis is proposed for addressing the nonrandom nonadherence for this scenario. The noninferiority TAILORx adjuvant breast cancer trial, where eligibility was based on a genomic risk score, is used as an example throughout. Results: The Lachin–Foulkes adjustment to the sample size improves the operating characteristics of noninferiority trials with random nonadherence. However, to maintain type 1 error probability, it is critical to adjust the noninferiorty margin as well as the sample size. With nonrandom nonadherence that is associated with a prognostic risk score, the type 1 error probability of the Lachin–Foulkes adjustment can be inflated (e.g. doubled) when the nonadherence is larger in the experimental arm than the standard arm. The proposed sensitivity analysis lessens the inflation in this situation. Conclusion: The Lachin–Foulkes adjustment to the sample size and noninferiority margin is a useful simple technique for attenuating the effects of random nonadherence in the noninferiority setting. With nonrandom nonadherence associated with a risk score known to the patients, the type 1 error probability can be inflated in certain situations. A proposed sensitivity analysis for these situations can attenuate the inflation.


2017 ◽  
Vol 35 (15_suppl) ◽  
pp. TPS11081-TPS11081 ◽  
Author(s):  
Robin Lewis Jones ◽  
Steven Attia ◽  
Cyrus R. Mehta ◽  
Lingyun Liu ◽  
Kamalesh Kumar Sankhala ◽  
...  

TPS11081 Background: AAS is an aggressive soft tissue sarcoma (STS) of endothelial cell origin with an expected median overall survival of 8-12 months. Pazopanib (P) is approved for treatment of advanced STS following progression on chemotherapy. In a retrospective study of 40 AAS patients treated with single agent P the median PFS was 3.1 months and median OS 9.9 months with no complete responses. Endoglin is an essential angiogenic receptor expressed on AAS that is upregulated following VEGF inhibition, and TRC105, an endoglin antibody, given with P produced durable complete responses in AAS patients with median PFS of 5.6 months in refractory patients including those receiving prior P. The TAPPAS trial is the first randomized Phase 3 trial performed in AAS, and was initiated following protocol assistance from the EMA and Special Protocol Assessment from the FDA. Methods: TAPPAS is a randomized multicenter study of TRC105/P vs P alone in the United States and Europe that is actively enrolling cutaneous and non-cutaneous AAS patients and incorporates an adaptive enrichment design. Key inclusion criteria: 0, 1 or 2 prior lines of therapy, ECOG ≤ 1. Primary endpoint is PFS and secondary endpoints include ORR and OS. The initial sample size of 124 patients, followed until 95 PFS events, provides more than 80% power to detect a hazard ratio of 0.55. At the time of interim analysis, projected to occur upon the occurrence of 40 events in approximately 70 patients, the result will be classified as belonging to either the favorable, promising, enrichment or unfavorable zones, based on conditional power. The sample size and PFS events will be unchanged in the favorable and unfavorable zones, and will be increased to a total of 200 patients followed for 170 PFS events in the promising zone. The trial will enroll 100 additional patients, with cutaneous disease only, in the enrichment zone and will follow them until 110 events are observed in the total cutaneous population. An independent DMC will follow the trial for safety and futility. The adaptive design requires the enrollment of fewer patients, preserves type-1 error, and protects power to detect a clinically meaningful survival benefit. (NCT 02979899). Clinical trial information: NCT02979899.


2008 ◽  
Vol 27 (3) ◽  
pp. 371-381 ◽  
Author(s):  
Steven Snapinn ◽  
Qi Jiang
Keyword(s):  

2018 ◽  
Vol 28 (6) ◽  
pp. 1879-1892 ◽  
Author(s):  
Alexandra Christine Graf ◽  
Gernot Wassmer ◽  
Tim Friede ◽  
Roland Gerard Gera ◽  
Martin Posch

With the advent of personalized medicine, clinical trials studying treatment effects in subpopulations are receiving increasing attention. The objectives of such studies are, besides demonstrating a treatment effect in the overall population, to identify subpopulations, based on biomarkers, where the treatment has a beneficial effect. Continuous biomarkers are often dichotomized using a threshold to define two subpopulations with low and high biomarker levels. If there is insufficient information on the dependence structure of the outcome on the biomarker, several thresholds may be investigated. The nested structure of such subpopulations is similar to the structure in group sequential trials. Therefore, it has been proposed to use the corresponding critical boundaries to test such nested subpopulations. We show that for biomarkers with a prognostic effect that is not adjusted for in the statistical model, the variability of the outcome may vary across subpopulations which may lead to an inflation of the family-wise type 1 error rate. Using simulations we quantify the potential inflation of testing procedures based on group sequential designs. Furthermore, alternative hypotheses tests that control the family-wise type 1 error rate under minimal assumptions are proposed. The methodological approaches are illustrated by a trial in depression.


2020 ◽  
Author(s):  
Guosheng Yin ◽  
Chenyang Zhang ◽  
Huaqing Jin

BACKGROUND Recently, three randomized clinical trials on coronavirus disease (COVID-19) treatments were completed: one for lopinavir-ritonavir and two for remdesivir. One trial reported that remdesivir was superior to placebo in shortening the time to recovery, while the other two showed no benefit of the treatment under investigation. OBJECTIVE The aim of this paper is to, from a statistical perspective, identify several key issues in the design and analysis of three COVID-19 trials and reanalyze the data from the cumulative incidence curves in the three trials using more appropriate statistical methods. METHODS The lopinavir-ritonavir trial enrolled 39 additional patients due to insignificant results after the sample size reached the planned number, which led to inflation of the type I error rate. The remdesivir trial of Wang et al failed to reach the planned sample size due to a lack of eligible patients, and the bootstrap method was used to predict the quantity of clinical interest conditionally and unconditionally if the trial had continued to reach the originally planned sample size. Moreover, we used a terminal (or cure) rate model and a model-free metric known as the restricted mean survival time or the restricted mean time to improvement (RMTI) to analyze the reconstructed data. The remdesivir trial of Beigel et al reported the median recovery time of the remdesivir and placebo groups, and the rate ratio for recovery, while both quantities depend on a particular time point representing local information. We use the restricted mean time to recovery (RMTR) as a global and robust measure for efficacy. RESULTS For the lopinavir-ritonavir trial, with the increase of sample size from 160 to 199, the type I error rate was inflated from 0.05 to 0.071. The difference of RMTIs between the two groups evaluated at day 28 was –1.67 days (95% CI –3.62 to 0.28; <i>P</i>=.09) in favor of lopinavir-ritonavir but not statistically significant. For the remdesivir trial of Wang et al, the difference of RMTIs at day 28 was –0.89 days (95% CI –2.84 to 1.06; <i>P</i>=.37). The planned sample size was 453, yet only 236 patients were enrolled. The conditional prediction shows that the hazard ratio estimates would reach statistical significance if the target sample size had been maintained. For the remdesivir trial of Beigel et al, the difference of RMTRs between the remdesivir and placebo groups at day 30 was –2.7 days (95% CI –4.0 to –1.2; <i>P</i>&lt;.001), confirming the superiority of remdesivir. The difference in the recovery time at the 25th percentile (95% CI –3 to 0; <i>P</i>=.65) was insignificant, while the differences became more statistically significant at larger percentiles. CONCLUSIONS Based on the statistical issues and lessons learned from the recent three clinical trials on COVID-19 treatments, we suggest more appropriate approaches for the design and analysis of ongoing and future COVID-19 trials.


Sign in / Sign up

Export Citation Format

Share Document