Sample size determinations using logistic regression with pilot data

Background: Bayesian response-adaptive designs, which data adaptively alter the allocation ratio in favor of the better performing treatment, are often criticized for engendering a non-trivial probability of a subject imbalance in favor of the inferior treatment, inflating type I error rate, and increasing sample size requirements. The implementation of these designs using the Thompson sampling methods has generally assumed a simple beta-binomial probability model in the literature; however, the effect of these choices on the resulting design operating characteristics relative to other reasonable alternatives has not been fully examined. Motivated by the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial, we posit that a logistic probability model coupled with an urn or permuted block randomization method will alleviate some of the practical limitations engendered by the conventional implementation of a two-arm Bayesian response-adaptive design with binary outcomes. In this article, we discuss up to what extent this solution works and when it does not. Methods: A computer simulation study was performed to evaluate the relative merits of a Bayesian response-adaptive design for the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial using the Thompson sampling methods based on a logistic regression probability model coupled with either an urn or permuted block randomization method that limits deviations from the evolving target allocation ratio. The different implementations of the response-adaptive design were evaluated for type I error rate control across various null response rates and power, among other performance metrics. Results: The logistic regression probability model engenders smaller average sample sizes with similar power, better control over type I error rate, and more favorable treatment arm sample size distributions than the conventional beta-binomial probability model, and designs using the alternative randomization methods have a negligible chance of a sample size imbalance in the wrong direction. Conclusion: Pairing the logistic regression probability model with either of the alternative randomization methods results in a much improved response-adaptive design in regard to important operating characteristics, including type I error rate control and the risk of a sample size imbalance in favor of the inferior treatment.

Download Full-text

Using Pilot Data to Estimate Sample Size and Compare Question Forms for a Crossover Study *

Journal of Occupational Health ◽

10.1539/joh.40.307 ◽

1998 ◽

Vol 40 (4) ◽

pp. 307-312 ◽

Cited By ~ 1

Author(s):

Maxia Dong ◽

Martin R. Petersen ◽

Mark J. Mendell

Keyword(s):

Crossover Study ◽

Sample Size ◽

Pilot Data ◽

Estimate Sample Size

Download Full-text

Power and sample size for multivariate logistic modeling of unmatched case-control studies

Statistical Methods in Medical Research ◽

10.1177/0962280217737157 ◽

2017 ◽

Vol 28 (3) ◽

pp. 822-834

Author(s):

Mitchell H Gail ◽

Sebastien Haneuse

Keyword(s):

Logistic Regression ◽

Sample Size ◽

Case Control ◽

Simulation Methods ◽

Case Control Studies ◽

Control Data ◽

Logistic Analysis ◽

Sample Size Calculations ◽

Control Designs ◽

Univariate Analyses

Sample size calculations are needed to design and assess the feasibility of case-control studies. Although such calculations are readily available for simple case-control designs and univariate analyses, there is limited theory and software for multivariate unconditional logistic analysis of case-control data. Here we outline the theory needed to detect scalar exposure effects or scalar interactions while controlling for other covariates in logistic regression. Both analytical and simulation methods are presented, together with links to the corresponding software.

Download Full-text

A Comparison Study of Goodness of Fit Tests of Logistic Regression in R: Simulation and Application to Breast Cancer Data

Academic Journal of Applied Mathematical Sciences ◽

10.32861/ajams.71.50.59 ◽

2020 ◽

pp. 50-59

Author(s):

El-Housainy A. Rady ◽

Mohamed R. Abonazel ◽

Mariam H. Metawe’e

Keyword(s):

Breast Cancer ◽

Logistic Regression ◽

Sample Size ◽

Null Hypothesis ◽

Goodness Of Fit ◽

Quadratic Term ◽

Breast Cancer Dataset ◽

Cancer Data ◽

Interaction Term ◽

Test Package

Goodness of fit (GOF) tests of logistic regression attempt to find out the suitability of the model to the data. The null hypothesis of all GOF tests is the model fit. R as a free software package has many GOF tests in different packages. A Monte Carlo simulation has been conducted to study two situations; the first, studying the ability of each test, under its default settings, to accept the null hypothesis when the model truly fitted. The second, studying the power of these tests when assumptions of sufficient linear combination of the explanatory variables are violated (by omitting linear covariate term, quadratic term, or interaction term). Moreover, checking whether the same test in different R packages had the same results or not. As the sample size supposed to affect simulation results, so the pattern of change of GOF tests results under different sample sizes as well as different model settings was estimated. All tests accept the null hypothesis (more than 95% of simulation trials) when the model truly fitted except modified Hosmer-Lemeshow test in "LogisticDx" package under all different model settings and Osius and Rojek’s (OsRo) test when the true model had an interaction term between binary and categorical covariates. In addition, le Cessie-van Houwelingen-Copas-Hosmer unweighted sum of squares (CHCH) test gave unexpected different results under different packages. Concerning the power study, all tests had a very low power when a departure of missing covariate existed. Generally, stukel’s test (package ’LogisticDX) and CHCH test (package "RMS") reached a power in detecting a missing quadratic term greater than 80% under lower sample size while OsRo test (package ’LogisticDX’) was better in detecting missing interaction term. Beside the simulation study, we evaluated the performance of GOF tests using the breast cancer dataset.

Download Full-text

Sample size and optimal design for logistic regression with binary interaction

Statistics in Medicine ◽

10.1002/sim.2980 ◽

2007 ◽

Vol 27 (1) ◽

pp. 36-46 ◽

Cited By ~ 88

Author(s):

Eugene Demidenko

Keyword(s):

Logistic Regression ◽

Optimal Design ◽

Sample Size ◽

Binary Interaction

Download Full-text

Sample Size for Logistic Regression with Small Response Probability

Journal of the American Statistical Association ◽

10.1080/01621459.1981.10477597 ◽

1981 ◽

Vol 76 (373) ◽

pp. 27-32 ◽

Cited By ~ 73

Author(s):

Alice S. Whittemore

Keyword(s):

Logistic Regression ◽

Sample Size ◽

Response Probability

Download Full-text

Smoking and substance abuse prevalence in adolescents in a city of Turkey

European Journal of Public Health ◽

10.1093/eurpub/ckz186.124 ◽

2019 ◽

Vol 29 (Supplement_4) ◽

Author(s):

B Mete ◽

E Pehlivan ◽

V Söyiler

Keyword(s):

Substance Abuse ◽

Substance Use ◽

Logistic Regression ◽

High School Students ◽

Sample Size ◽

Female Students ◽

Male Students ◽

Size Analysis ◽

Addictive Substance ◽

Abuse Risk

Abstract Background The aim of this study was to determine the prevalence of smoking and abuse of substance among young people aged 14-18 in a city of Turkey and to determine the relationship between smoking and substance abuse risk. Methods This cross-sectional study was conducted on high school students studying in Bingöl city center. The universe of the study consists of 14000 students studying in 14 high schools. The minimum sample size required to be reached in the sample size analysis with reference to 80% power and 99% confidence interval was found to be 1235. According to the stratified sampling method, the students were randomly reached in schools and questionnaires were conducted under supervision by taking their consent. Chi-square test, Binary Logistic Regression test were used for data analysis. Results The mean age of the students was 15.71 ± 1.16 (min-max: 14-18) and 49.5% were male. The prevalence of smoking among all students is 15.8%, addictive substance use / trial frequency 5% except smoking. The prevalence of smoking among male students is 24.1%, in female students 7.7%. The rate of using addictive substance was found to be 8.2% for male students and 1.9% for female students except smoking. According to the results of Logistic Regression; substance abuse increases 8 (95% CI:3,32-19,95) fold in smokers (p = 0,001) and 2.5 (95% CI:1,10-5,38) fold in men (p = 0,027). The risk of substance use increases 1.05 (95% CI:1,02-1,08) fold as the number of cigarettes smoked daily (p = 0,001). Substance abuse risk of 18-year-olds shows increase 1.5 (95% CI:1,06-1,93) fold according to 14 years old (p = 0,021). Conclusions Smoking and addictive substance use in adolescents are particularly remarkable in male students (8.2%). This result is higher than the data reflecting Ä°stanbul (7%). This may be due to the fact that the province is located at the crossing point of drug traffic. Smoking increases the risk of other addictive substances (marijuana, heroin, etc.). Key messages Smoking and substance abuse is an important health problem in adolescents according to this study. Male students smoke are at risk of substance abuse more than female.

Download Full-text

Bias in odds ratios by logistic regression modelling and sample size

BMC Medical Research Methodology ◽

10.1186/1471-2288-9-56 ◽

2009 ◽

Vol 9 (1) ◽

Cited By ~ 167

Author(s):

Szilard Nemes ◽

Junmei Miao Jonasson ◽

Anna Genell ◽

Gunnar Steineck

Keyword(s):

Logistic Regression ◽

Sample Size ◽

Odds Ratios ◽

Regression Modelling ◽

Logistic Regression Modelling

Download Full-text

A Simulation Study to Assess the Effect of the Number of Response Categories on the Power of Ordinal Logistic Regression for Differential Item Functioning Analysis in Rating Scales

Computational and Mathematical Methods in Medicine ◽

10.1155/2016/5080826 ◽

2016 ◽

Vol 2016 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Elahe Allahyari ◽

Peyman Jafari ◽

Zahra Bagheri

Keyword(s):

Logistic Regression ◽

Sample Size ◽

Differential Item Functioning ◽

Rating Scales ◽

Error Rates ◽

Ordinal Logistic Regression ◽

Type I ◽

Item Functioning ◽

Quality Of Life Scale ◽

The Impact

Objective.The present study uses simulated data to find what the optimal number of response categories is to achieve adequate power in ordinal logistic regression (OLR) model for differential item functioning (DIF) analysis in psychometric research.Methods.A hypothetical ten-item quality of life scale with three, four, and five response categories was simulated. The power and type I error rates of OLR model for detecting uniform DIF were investigated under different combinations of ability distribution (θ), sample size, sample size ratio, and the magnitude of uniform DIF across reference and focal groups.Results.Whenθwas distributed identically in the reference and focal groups, increasing the number of response categories from 3 to 5 resulted in an increase of approximately 8% in power of OLR model for detecting uniform DIF. The power of OLR was less than 0.36 when ability distribution in the reference and focal groups was highly skewed to the left and right, respectively.Conclusions.The clearest conclusion from this research is that the minimum number of response categories for DIF analysis using OLR is five. However, the impact of the number of response categories in detecting DIF was lower than might be expected.

Download Full-text