The Effect of Small Sample Size on Measurement Equivalence of Psychometric Questionnaires in MIMIC Model: A Simulation Study

BioMed Research International ◽

10.1155/2017/7596101 ◽

2017 ◽

Vol 2017 ◽

pp. 1-12

Author(s):

Jamshid Jamali ◽

Seyyed Mohammad Taghi Ayatollahi ◽

Peyman Jafari

Keyword(s):

Sample Size ◽

Type I Error ◽

Measurement Equivalence ◽

Latent Trait ◽

Type I ◽

Mimic Model ◽

Latent Construct ◽

Trait Distribution ◽

Latent Trait Distribution ◽

Scale Length

Evaluating measurement equivalence (also known as differential item functioning (DIF)) is an important part of the process of validating psychometric questionnaires. This study aimed at evaluating the multiple indicators multiple causes (MIMIC) model for DIF detection when latent construct distribution is nonnormal and the focal group sample size is small. In this simulation-based study, Type I error rates and power of MIMIC model for detecting uniform-DIF were investigated under different combinations of reference to focal group sample size ratio, magnitude of the uniform-DIF effect, scale length, the number of response categories, and latent trait distribution. Moderate and high skewness in the latent trait distribution led to a decrease of 0.33% and 0.47% power of MIMIC model for detecting uniform-DIF, respectively. The findings indicated that, by increasing the scale length, the number of response categories and magnitude DIF improved the power of MIMIC model, by 3.47%, 4.83%, and 20.35%, respectively; it also decreased Type I error of MIMIC approach by 2.81%, 5.66%, and 0.04%, respectively. This study revealed that power of MIMIC model was at an acceptable level when latent trait distributions were skewed. However, empirical Type I error rate was slightly greater than nominal significance level. Consequently, the MIMIC was recommended for detection of uniform-DIF when latent construct distribution is nonnormal and the focal group sample size is small.

Download Full-text

Alternative models and randomization techniques for Bayesian response-adaptive randomization with binary outcomes

Clinical Trials ◽

10.1177/17407745211010139 ◽

2021 ◽

pp. 174077452110101

Author(s):

Jennifer Proper ◽

John Connett ◽

Thomas Murray

Keyword(s):

Logistic Regression ◽

Sample Size ◽

Error Rate ◽

Adaptive Design ◽

Type I Error ◽

Probability Model ◽

Binary Outcomes ◽

Type I ◽

Operating Characteristics ◽

Type I Error Rate

Background: Bayesian response-adaptive designs, which data adaptively alter the allocation ratio in favor of the better performing treatment, are often criticized for engendering a non-trivial probability of a subject imbalance in favor of the inferior treatment, inflating type I error rate, and increasing sample size requirements. The implementation of these designs using the Thompson sampling methods has generally assumed a simple beta-binomial probability model in the literature; however, the effect of these choices on the resulting design operating characteristics relative to other reasonable alternatives has not been fully examined. Motivated by the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial, we posit that a logistic probability model coupled with an urn or permuted block randomization method will alleviate some of the practical limitations engendered by the conventional implementation of a two-arm Bayesian response-adaptive design with binary outcomes. In this article, we discuss up to what extent this solution works and when it does not. Methods: A computer simulation study was performed to evaluate the relative merits of a Bayesian response-adaptive design for the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial using the Thompson sampling methods based on a logistic regression probability model coupled with either an urn or permuted block randomization method that limits deviations from the evolving target allocation ratio. The different implementations of the response-adaptive design were evaluated for type I error rate control across various null response rates and power, among other performance metrics. Results: The logistic regression probability model engenders smaller average sample sizes with similar power, better control over type I error rate, and more favorable treatment arm sample size distributions than the conventional beta-binomial probability model, and designs using the alternative randomization methods have a negligible chance of a sample size imbalance in the wrong direction. Conclusion: Pairing the logistic regression probability model with either of the alternative randomization methods results in a much improved response-adaptive design in regard to important operating characteristics, including type I error rate control and the risk of a sample size imbalance in favor of the inferior treatment.

Download Full-text

Increasing the sample size at interim for a two-sample experiment without Type I error inflation

Pharmaceutical Statistics ◽

10.1002/pst.390 ◽

2009 ◽

Vol 9 (4) ◽

pp. 280-287 ◽

Cited By ~ 1

Author(s):

Keith Dunnigan ◽

Dennis W. King

Keyword(s):

Sample Size ◽

Type I Error ◽

Type I ◽

Sample Experiment

Download Full-text

A Multi-faceted Mess: A Review of Statistical Power Analysis in Psychology Journal Articles

10.31234/osf.io/3bdfu ◽

2019 ◽

Cited By ~ 2

Author(s):

Rob Cribbie ◽

Nataly Beribisky ◽

Udi Alter

Keyword(s):

Sample Size ◽

Effect Size ◽

Power Analysis ◽

Statistical Power ◽

Type I Error ◽

A Priori ◽

Type I ◽

Specific Level ◽

Maximum Sample Size ◽

Power Analyses

Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is often difficult but unavoidable for conducting the procedure properly, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact psychology journals in 2016 and 2017. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.

Download Full-text

Required sample size for comparing two independent means

Marine Medicine ◽

10.22328/2413-5747-2020-6-2-106-113 ◽

2020 ◽

Vol 6 (2) ◽

pp. 106-113

Author(s):

A. M. Grjibovski ◽

M. A. Gorbatova ◽

A. N. Narkevich ◽

K. A. Vinogradov

Keyword(s):

Sample Size ◽

Statistical Power ◽

Type I Error ◽

Sample Size Calculation ◽

Biomedical Literature ◽

Type I ◽

Research Practice ◽

False Null Hypothesis ◽

Different Levels ◽

Russian Research

Sample size calculation in a planning phase is still uncommon in Russian research practice. This situation threatens validity of the conclusions and may introduce Type I error when the false null hypothesis is accepted due to lack of statistical power to detect the existing difference between the means. Comparing two means using unpaired Students’ ttests is the most common statistical procedure in the Russian biomedical literature. However, calculations of the minimal required sample size or retrospective calculation of the statistical power were observed only in very few publications. In this paper we demonstrate how to calculate required sample size for comparing means in unpaired samples using WinPepi and Stata software. In addition, we produced tables for minimal required sample size for studies when two means have to be compared and body mass index and blood pressure are the variables of interest. The tables were constructed for unpaired samples for different levels of statistical power and standard deviations obtained from the literature.

Download Full-text

Optimized adaptive enrichment designs for three-arm trials: learning which subpopulations benefit from different treatments

Biostatistics ◽

10.1093/biostatistics/kxz030 ◽

2019 ◽

Author(s):

Jon Arni Steingrimsson ◽

Joshua Betz ◽

Tianchen Qian ◽

Michael Rosenblum

Keyword(s):

Sample Size ◽

Disease Severity ◽

Adaptive Design ◽

Type I Error ◽

Type I ◽

Expected Sample Size ◽

Common Control ◽

Enrichment Designs ◽

Adaptive Enrichment Designs ◽

Standard Designs

Summary We consider the problem of designing a confirmatory randomized trial for comparing two treatments versus a common control in two disjoint subpopulations. The subpopulations could be defined in terms of a biomarker or disease severity measured at baseline. The goal is to determine which treatments benefit which subpopulations. We develop a new class of adaptive enrichment designs tailored to solving this problem. Adaptive enrichment designs involve a preplanned rule for modifying enrollment based on accruing data in an ongoing trial. At the interim analysis after each stage, for each subpopulation, the preplanned rule may decide to stop enrollment or to stop randomizing participants to one or more study arms. The motivation for this adaptive feature is that interim data may indicate that a subpopulation, such as those with lower disease severity at baseline, is unlikely to benefit from a particular treatment while uncertainty remains for the other treatment and/or subpopulation. We optimize these adaptive designs to have the minimum expected sample size under power and Type I error constraints. We compare the performance of the optimized adaptive design versus an optimized nonadaptive (single stage) design. Our approach is demonstrated in simulation studies that mimic features of a completed trial of a medical device for treating heart failure. The optimized adaptive design has $25\%$ smaller expected sample size compared to the optimized nonadaptive design; however, the cost is that the optimized adaptive design has $8\%$ greater maximum sample size. Open-source software that implements the trial design optimization is provided, allowing users to investigate the tradeoffs in using the proposed adaptive versus standard designs.

Download Full-text

Comparison of a Two-Stage and Three-Stage Interim-Analysis Procedure

Psychological Reports ◽

10.2466/pr0.1992.71.1.3 ◽

1992 ◽

Vol 71 (1) ◽

pp. 3-14 ◽

Cited By ~ 1

Author(s):

John E. Overall ◽

Robert S. Atlas

Keyword(s):

Sample Size ◽

Interim Analysis ◽

Type I Error ◽

Substantial Reduction ◽

Error Rates ◽

Sampling Plan ◽

Type I ◽

Two Stage ◽

Expected Sample Size ◽

Analysis Strategy

A statistical model for combining p values from multiple tests of significance is used to define rejection and acceptance regions for two-stage and three-stage sampling plans. Type I error rates, power, frequencies of early termination decisions, and expected sample sizes are compared. Both the two-stage and three-stage procedures provide appropriate protection against Type I errors. The two-stage sampling plan with its single interim analysis entails minimal loss in power and provides substantial reduction in expected sample size as compared with a conventional single end-of-study test of significance for which power is in the adequate range. The three-stage sampling plan with its two interim analyses introduces somewhat greater reduction in power, but it compensates with greater reduction in expected sample size. Either interim-analysis strategy is more efficient than a single end-of-study analysis in terms of power per unit of sample size.

Download Full-text

Simulation Studies of the Effects of Small Sample Size and Studied Item Parameters on SIBTEST and Mantel-Haenszel Type I Error Performance

Journal of Educational Measurement ◽

10.1111/j.1745-3984.1996.tb00490.x ◽

1996 ◽

Vol 33 (2) ◽

pp. 215-230 ◽

Cited By ~ 134

Author(s):

Louis A. Roussos ◽

William F. Stout

Keyword(s):

Sample Size ◽

Type I Error ◽

Small Sample Size ◽

Small Sample ◽

Type I ◽

Simulation Studies ◽

Error Performance ◽

Item Parameters

Download Full-text

Two-sample binary phase 2 trials with low type I error and low sample size

Statistics in Medicine ◽

10.1002/sim.7226 ◽

2017 ◽

Vol 36 (9) ◽

pp. 1383-1394 ◽

Cited By ~ 2

Author(s):

Samuel Litwin ◽

Stanley Basickes ◽

Eric A. Ross

Keyword(s):

Sample Size ◽

Type I Error ◽

Type I ◽

Phase 2 ◽

Binary Phase

Download Full-text

Use of interval estimations in design and evaluation of multiregional clinical trials with continuous outcomes

Statistical Methods in Medical Research ◽

10.1177/0962280217751277 ◽

2018 ◽

Vol 28 (7) ◽

pp. 2179-2195 ◽

Cited By ~ 1

Author(s):

Chieh Chiang ◽

Chin-Fu Hsiao

Keyword(s):

Clinical Trials ◽

Sample Size ◽

Type I Error ◽

Interval Estimation ◽

Error Rates ◽

New Drugs ◽

Sample Size Determination ◽

Type I ◽

Size Determination ◽

Interval Estimators

Multiregional clinical trials have been accepted in recent years as a useful means of accelerating the development of new drugs and abridging their approval time. The statistical properties of multiregional clinical trials are being widely discussed. In practice, variance of a continuous response may be different from region to region, but it leads to the assessment of the efficacy response falling into a Behrens–Fisher problem—there is no exact testing or interval estimator for mean difference with unequal variances. As a solution, this study applies interval estimations of the efficacy response based on Howe’s, Cochran–Cox’s, and Satterthwaite’s approximations, which have been shown to have well-controlled type I error rates. However, the traditional sample size determination cannot be applied to the interval estimators. The sample size determination to achieve a desired power based on these interval estimators is then presented. Moreover, the consistency criteria suggested by the Japanese Ministry of Health, Labour and Welfare guidance to decide whether the overall results from the multiregional clinical trial obtained via the proposed interval estimation were also applied. A real example is used to illustrate the proposed method. The results of simulation studies indicate that the proposed method can correctly determine the required sample size and evaluate the assurance probability of the consistency criteria.

Download Full-text

A Fully-Adjusted Two-Stage Procedure for Rank Normalization in Genetic Association Studies

10.1101/344770 ◽

2018 ◽

Author(s):

Tamar Sofer ◽

Xiuwen Zheng ◽

Stephanie M. Gogarten ◽

Cecelia A. Laurie ◽

Kelsey Grinde ◽

...

Keyword(s):

Statistical Power ◽

Type I Error ◽

Association Studies ◽

Genetic Association Studies ◽

Statistical Properties ◽

Type I ◽

Residual Distribution ◽

Two Stage ◽

Trait Distribution ◽

Error Rate Control

AbstractWhen testing genotype-phenotype associations using linear regression, departure of the trait distribution from normality can impact both Type I error rate control and statistical power, with worse consequences for rarer variants. While it has been shown that applying a rank-normalization transformation to trait values before testing may improve these statistical properties, the factor driving them is not the trait distribution itself, but its residual distribution after regression on both covariates and genotype. Because genotype is expected to have a small effect (if any) investigators now routinely use a two-stage method, in which they first regress the trait on covariates, obtain residuals, rank-normalize them, and then secondly use the rank-normalized residuals in association analysis with the genotypes. Potential confounding signals are assumed to be removed at the first stage, so in practice no further adjustment is done in the second stage. Here, we show that this widely-used approach can lead to tests with undesirable statistical properties, due to both a combination of a mis-specified mean-variance relationship, and remaining covariate associations between the rank-normalized residuals and genotypes. We demonstrate these properties theoretically, and also in applications to genome-wide and whole-genome sequencing association studies. We further propose and evaluate an alternative fully-adjusted two-stage approach that adjusts for covariates both when residuals are obtained, and in the subsequent association test. This method can reduce excess Type I errors and improve statistical power.

Download Full-text