scholarly journals Common Medical and Statistical Problems: The Dilemma of the Sample Size Calculation for Sensitivity and Specificity Estimation

Mathematics ◽  
2020 ◽  
Vol 8 (8) ◽  
pp. 1258
Author(s):  
M. Rosário Oliveira ◽  
Ana Subtil ◽  
Luzia Gonçalves

Sample size calculation in biomedical practice is typically based on the problematic Wald method for a binomial proportion, with potentially dangerous consequences. This work highlights the need of incorporating the concept of conditional probability in sample size determination to avoid reduced sample sizes that lead to inadequate confidence intervals. Therefore, new definitions are proposed for coverage probability and expected length of confidence intervals for conditional probabilities, like sensitivity and specificity. The new definitions were used to assess seven confidence interval estimation methods. In order to determine the sample size, two procedures—an optimal one, based on the new definitions, and an approximation—were developed for each estimation method. Our findings confirm the similarity of the approximated sample sizes to the optimal ones. R code is provided to disseminate these methodological advances and translate them into biomedical practice.

2020 ◽  
Author(s):  
Evangelia Christodoulou ◽  
Maarten van Smeden ◽  
Michael Edlinger ◽  
Dirk Timmerman ◽  
Maria Wanitschek ◽  
...  

Abstract Background: We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in. Methods: We illustrate the approach using data for the diagnosis of ovarian cancer (n=5914, 33% event fraction) and obstructive coronary artery disease (CAD; n=4888, 44% event fraction). We used logistic regression to develop a prediction model consisting only of a-priori selected predictors and assumed linear relations for continuous predictors. We mimicked prospective patient recruitment by developing the model on 100 randomly selected patients, and we used bootstrapping to internally validate the model. We sequentially added 50 random new patients until we reached a sample size of 3000, and re-estimated model performance at each step. We examined the required sample size for satisfying the following stopping rule: obtaining a calibration slope ≥0.9 and optimism in the c-statistic (ΔAUC) <=0.02 at two consecutive sample sizes. This procedure was repeated 500 times. We also investigated the impact of alternative modeling strategies: modeling nonlinear relations for continuous predictors, and applying Firth’s bias correction.Results: Better discrimination was achieved in the ovarian cancer data (c-statistic 0.9 with 7 predictors) than in the CAD data (c-statistic 0.7 with 11 predictors). Adequate calibration and limited optimism in discrimination was achieved after a median of 450 patients (interquartile range 450-500) for the ovarian cancer data (22 events per parameter (EPP), 20-24), and 750 patients (700-800) for the CAD data (30 EPP, 28-33). A stricter criterion, requiring ΔAUC <=0.01, was met with a median of 500 (23 EPP) and 1350 (54 EPP) patients, respectively. These sample sizes were much higher than the well-known 10 EPP rule of thumb and slightly higher than a recently published fixed sample size calculation method by Riley et al. Higher sample sizes were required when nonlinear relationships were modeled, and lower sample sizes when Firth’s correction was used. Conclusions: Adaptive sample size determination can be a useful supplement to a priori sample size calculations, because it allows to further tailor the sample size to the specific prediction modeling context in a dynamic fashion.


2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Evangelia Christodoulou ◽  
Maarten van Smeden ◽  
Michael Edlinger ◽  
Dirk Timmerman ◽  
Maria Wanitschek ◽  
...  

Abstract Background We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in. Methods We illustrate the approach using data for the diagnosis of ovarian cancer (n = 5914, 33% event fraction) and obstructive coronary artery disease (CAD; n = 4888, 44% event fraction). We used logistic regression to develop a prediction model consisting only of a priori selected predictors and assumed linear relations for continuous predictors. We mimicked prospective patient recruitment by developing the model on 100 randomly selected patients, and we used bootstrapping to internally validate the model. We sequentially added 50 random new patients until we reached a sample size of 3000 and re-estimated model performance at each step. We examined the required sample size for satisfying the following stopping rule: obtaining a calibration slope ≥ 0.9 and optimism in the c-statistic (or AUC) < = 0.02 at two consecutive sample sizes. This procedure was repeated 500 times. We also investigated the impact of alternative modeling strategies: modeling nonlinear relations for continuous predictors and correcting for bias on the model estimates (Firth’s correction). Results Better discrimination was achieved in the ovarian cancer data (c-statistic 0.9 with 7 predictors) than in the CAD data (c-statistic 0.7 with 11 predictors). Adequate calibration and limited optimism in discrimination was achieved after a median of 450 patients (interquartile range 450–500) for the ovarian cancer data (22 events per parameter (EPP), 20–24) and 850 patients (750–900) for the CAD data (33 EPP, 30–35). A stricter criterion, requiring AUC optimism < = 0.01, was met with a median of 500 (23 EPP) and 1500 (59 EPP) patients, respectively. These sample sizes were much higher than the well-known 10 EPP rule of thumb and slightly higher than a recently published fixed sample size calculation method by Riley et al. Higher sample sizes were required when nonlinear relationships were modeled, and lower sample sizes when Firth’s correction was used. Conclusions Adaptive sample size determination can be a useful supplement to fixed a priori sample size calculations, because it allows to tailor the sample size to the specific prediction modeling context in a dynamic fashion.


2020 ◽  
Author(s):  
David Douglas Newstein

Abstract Background: The assumption that the sampling distribution of the crude Odds Ratio (ORcrude) is a lognormal distribution with parameters mu and sigma leads to the incorrect conclusion that the expectation of the log of ORcrude is equal to the parameter mu. Here, the standard method of point and interval estimation (I) is compared with a modified method utilizing ORstar where ln(ORstar) = ln(ORcrude )– sigma **2/2. Methods: Confidence intervals are obtained utilizing ln(ORstar) by both parametric bootstrap simulations with a percentile derived confidence interval (II), and a simple calculation done by replacing ln(ORcrude) with ln(ORstar) in the standard formula (III) as well as a method proposed by Barendregt (IV), who also noted the bias present in estimating ORtrue by ORcrude. Simulations are conducted for a “protective” exposure (ORtrue < 1) as well as for a “harmful” exposure (ORtrue >1). Results: In simulations the estimation methods (II and III) exhibited the highest level of statistical conclusion validity for their confidence intervals as indicated by one minus the coverage probability being close to alpha. Also, as demonstrated by the MC simulations, these two methods exhibited the least biased point estimates and the narrowest confidence intervals of the four estimation approaches. Conclusions: Monte Carlo simulations prove useful in validating the inferential procedures used in data analysis. In the case of the odds ratio, the standard method of point and interval estimation is based on the assumption that the crude odds ratio has a sampling distribution that is lognormal. Utilizing this assumption, as well as the formula for the expectation of this distribution function, an alternative estimation method was obtained for ORtrue (but different from a method from the earlier report (Barendregt)), that yielded point and interval estimates that MC simulations indicate are the most statistically valid.


2019 ◽  
Author(s):  
Atser Damsma ◽  
Nadine Schlichting ◽  
Hedderik van Rijn ◽  
Warrick Roseboom

In interval timing experiments, motor reproduction is the predominant method used when participants are asked to estimate an interval. However, it is unknown how its accuracy, precision and efficiency compare to alternative methods, such as indicating the duration by spatial estimation on a timeline. In two experiments, we compared different interval estimation methods. In the first experiment, participants were asked to reproduce an interval by means of motor reproduction, timeline estimation, or verbal estimation. We found that, on average, verbal estimates were more accurate and precise than line estimates and motor reproductions. However, we found a bias towards familiar whole second units when giving verbal estimates. Motor reproductions were more precise, but not more accurate than timeline estimates. In the second experiment, we used a more complex task: Participants were presented a stream of digits and one target letters and were subsequently asked to reproduce both the interval to target onset and the duration of the total stream by means of motor reproduction and timeline estimation. We found that motor reproductions were more accurate, but not more precise than timeline estimates. In both experiments, timeline estimates had the lowest reaction times. Overall, our results suggest that the transformation of time into space has only a relatively minor cost. In addition, they show that each estimation method comes with its own advantages, and that the choice of estimation method depends on choices in the experimental design: for example, when using durations with integer durations verbal estimates are superior, yet when testing long durations, motor reproductions are time intensive making timeline estimates a more sensible choice.


2021 ◽  
Author(s):  
Metin Bulus

A recent systematic review of experimental studies conducted in Turkey between 2010 and 2020 reported that small sample sizes had been a significant drawback (Bulus and Koyuncu, 2021). A small chunk of the studies were small-scale true experiments (subjects randomized into the treatment and control groups). The remaining studies consisted of quasi-experiments (subjects in treatment and control groups were matched on pretest or other covariates) and weak experiments (neither randomized nor matched but had the control group). They had an average sample size below 70 for different domains and outcomes. These small sample sizes imply a strong (and perhaps erroneous) assumption about the minimum relevant effect size (MRES) of intervention before an experiment is conducted; that is, a standardized intervention effect of Cohen’s d &lt; 0.50 is not relevant to education policy or practice. Thus, an introduction to sample size determination for pretest-posttest simple experimental designs is warranted. This study describes nuts and bolts of sample size determination, derives expressions for optimal design under differential cost per treatment and control units, provide convenient tables to guide sample size decisions for MRES values between 0.20 ≤ Cohen’s d ≤ 0.50, and describe the relevant software along with illustrations.


2018 ◽  
Vol 28 (7) ◽  
pp. 2179-2195 ◽  
Author(s):  
Chieh Chiang ◽  
Chin-Fu Hsiao

Multiregional clinical trials have been accepted in recent years as a useful means of accelerating the development of new drugs and abridging their approval time. The statistical properties of multiregional clinical trials are being widely discussed. In practice, variance of a continuous response may be different from region to region, but it leads to the assessment of the efficacy response falling into a Behrens–Fisher problem—there is no exact testing or interval estimator for mean difference with unequal variances. As a solution, this study applies interval estimations of the efficacy response based on Howe’s, Cochran–Cox’s, and Satterthwaite’s approximations, which have been shown to have well-controlled type I error rates. However, the traditional sample size determination cannot be applied to the interval estimators. The sample size determination to achieve a desired power based on these interval estimators is then presented. Moreover, the consistency criteria suggested by the Japanese Ministry of Health, Labour and Welfare guidance to decide whether the overall results from the multiregional clinical trial obtained via the proposed interval estimation were also applied. A real example is used to illustrate the proposed method. The results of simulation studies indicate that the proposed method can correctly determine the required sample size and evaluate the assurance probability of the consistency criteria.


Sign in / Sign up

Export Citation Format

Share Document