scholarly journals How sample size influences the replicability of task-based fMRI

2017 ◽  
Author(s):  
Benjamin O. Turner ◽  
Erick J. Paul ◽  
Michael B. Miller ◽  
Aron K. Barbey

Despite a growing body of research suggesting that task-based functional magnetic resonance imaging (fMRI) studies often suffer from a lack of statistical power due to too-small samples, the proliferation of such underpowered studies continues unabated. Using large independent samples across eleven distinct tasks, we demonstrate the impact of sample size on replicability, assessed at different levels of analysis relevant to fMRI researchers. We find that the degree of replicability for typical sample sizes is modest and that sample sizes much larger than typical (e.g., N = 100) produce results that fall well short of perfectly replicable. Thus, our results join the existing line of work advocating for larger sample sizes. Moreover, because we test sample sizes over a fairly large range and use intuitive metrics of replicability, our hope is that our results are more understandable and convincing to researchers who may have found previous results advocating for larger samples inaccessible.

1997 ◽  
Vol 67 (12) ◽  
pp. 857-865 ◽  
Author(s):  
Patricia Bel-Berger ◽  
Terri Von Hoven

Combinations of gin and mill cleaning sequences have been studied to determine the best way to clean both smooth-leaf and hairy-leaf cottons. The two varieties were subjected to four different levels of lint cleaning at the gin, followed by nine different mill cleaning sequences, for a total of thirty-six samples. All samples were tested for fiber properties (Part II), yarn strength, and fabric strength and appearance. The yarn and fabric properties are reported in this paper. In the middle of the study, the card wire was damaged and subsequently replaced, which presented the opportunity to determine the impact of the card wire's condition on white specks. In addition, image analysis of the fabric samples by Optimas detected the percent white, the percentage of the area of white specks in a specified area of fabric. Because of the variability of white specks, a larger sample size was needed than was available for the mill samples, so only trends can be reported for the mill samples. In general, the more aggressive the cleaning, the higher the percent white. When comparing the effect of ginning, each additional lint cleaner produced an increase in percent white for the worn card wire. The new card wire decreased the percent white overall as compared to the worn card wire. The new card wire samples with three lint cleanings had a significantly higher white speck level than zero, one, or two lint cleaners. Similarly, the harsher the mill cleaning, the higher the percent white. The hairy-leaf variety produced percent white values similar to those for the smooth-leaf cotton for both the old and new card wires. Thus, when confronted with the possibility of a white speck problem, minimal gin cleaning and less aggressive mill cleaning are recommended.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Florent Le Borgne ◽  
Arthur Chatton ◽  
Maxime Léger ◽  
Rémi Lenain ◽  
Yohann Foucher

AbstractIn clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.


2019 ◽  
Author(s):  
Peter E Clayson ◽  
Kaylie Amanda Carbine ◽  
Scott Baldwin ◽  
Michael J. Larson

Methodological reporting guidelines for studies of event-related potentials (ERPs) were updated in Psychophysiology in 2014. These guidelines facilitate the communication of key methodological parameters (e.g., preprocessing steps). Failing to report key parameters represents a barrier to replication efforts, and difficultly with replicability increases in the presence of small sample sizes and low statistical power. We assessed whether guidelines are followed and estimated the average sample size and power in recent research. Reporting behavior, sample sizes, and statistical designs were coded for 150 randomly-sampled articles from five high-impact journals that frequently publish ERP research from 2011 to 2017. An average of 63% of guidelines were reported, and reporting behavior was similar across journals, suggesting that gaps in reporting is a shortcoming of the field rather than any specific journal. Publication of the guidelines paper had no impact on reporting behavior, suggesting that editors and peer reviewers are not enforcing these recommendations. The average sample size per group was 21. Statistical power was conservatively estimated as .72-.98 for a large effect size, .35-.73 for a medium effect, and .10-.18 for a small effect. These findings indicate that failing to report key guidelines is ubiquitous and that ERP studies are primarily powered to detect large effects. Such low power and insufficient following of reporting guidelines represent substantial barriers to replication efforts. The methodological transparency and replicability of studies can be improved by the open sharing of processing code and experimental tasks and by a priori sample size calculations to ensure adequately powered studies.


2021 ◽  
Vol 3 (1) ◽  
pp. 61-89
Author(s):  
Stefan Geiß

Abstract This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.


2020 ◽  
Author(s):  
Chia-Lung Shih ◽  
Te-Yu Hung

Abstract Background A small sample size (n < 30 for each treatment group) is usually enrolled to investigate the differences in efficacy between treatments for knee osteoarthritis (OA). The objective of this study was to use simulation for comparing the power of four statistical methods for analysis of small sample size for detecting the differences in efficacy between two treatments for knee OA. Methods A total of 10,000 replicates of 5 sample sizes (n=10, 15, 20, 25, and 30 for each group) were generated based on the previous reported measures of treatment efficacy. Four statistical methods were used to compare the differences in efficacy between treatments, including the two-sample t-test (t-test), the Mann-Whitney U-test (M-W test), the Kolmogorov-Smirnov test (K-S test), and the permutation test (perm-test). Results The bias of simulated parameter means showed a decreased trend with sample size but the CV% of simulated parameter means varied with sample sizes for all parameters. For the largest sample size (n=30), the CV% could achieve a small level (<20%) for almost all parameters but the bias could not. Among the non-parametric tests for analysis of small sample size, the perm-test had the highest statistical power, and its false positive rate was not affected by sample size. However, the power of the perm-test could not achieve a high value (80%) even using the largest sample size (n=30). Conclusion The perm-test is suggested for analysis of small sample size to compare the differences in efficacy between two treatments for knee OA.


2020 ◽  
Vol 6 (2) ◽  
pp. 106-113
Author(s):  
A. M. Grjibovski ◽  
M. A. Gorbatova ◽  
A. N. Narkevich ◽  
K. A. Vinogradov

Sample size calculation in a planning phase is still uncommon in Russian research practice. This situation threatens validity of the conclusions and may introduce Type I error when the false null hypothesis is accepted due to lack of statistical power to detect the existing difference between the means. Comparing two means using unpaired Students’ ttests is the most common statistical procedure in the Russian biomedical literature. However, calculations of the minimal required sample size or retrospective calculation of the statistical power were observed only in very few publications. In this paper we demonstrate how to calculate required sample size for comparing means in unpaired samples using WinPepi and Stata software. In addition, we produced tables for minimal required sample size for studies when two means have to be compared and body mass index and blood pressure are the variables of interest. The tables were constructed for unpaired samples for different levels of statistical power and standard deviations obtained from the literature.


2019 ◽  
Author(s):  
Pengchao Ye ◽  
Wenbin Ye ◽  
Congting Ye ◽  
Shuchao Li ◽  
Lishan Ye ◽  
...  

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) is fast and becoming a powerful technique for studying dynamic gene regulation at unprecedented resolution. However, scRNA-seq data suffer from problems of extremely high dropout rate and cell-to-cell variability, demanding new methods to recover gene expression loss. Despite the availability of various dropout imputation approaches for scRNA-seq, most studies focus on data with a medium or large number of cells, while few studies have explicitly investigated the differential performance across different sample sizes or the applicability of the approach on small or imbalanced data. It is imperative to develop new imputation approaches with higher generalizability for data with various sample sizes. Results We proposed a method called scHinter for imputing dropout events for scRNA-seq with special emphasis on data with limited sample size. scHinter incorporates a voting-based ensemble distance and leverages the synthetic minority oversampling technique for random interpolation. A hierarchical framework is also embedded in scHinter to increase the reliability of the imputation for small samples. We demonstrated the ability of scHinter to recover gene expression measurements across a wide spectrum of scRNA-seq datasets with varied sample sizes. We comprehensively examined the impact of sample size and cluster number on imputation. Comprehensive evaluation of scHinter across diverse scRNA-seq datasets with imbalanced or limited sample size showed that scHinter achieved higher and more robust performance than competing approaches, including MAGIC, scImpute, SAVER and netSmooth. Availability and implementation Freely available for download at https://github.com/BMILAB/scHinter. Supplementary information Supplementary data are available at Bioinformatics online.


2003 ◽  
Vol 78 (4) ◽  
pp. 983-1002 ◽  
Author(s):  
Randal J. Elder ◽  
Robert D. Allen

This study examines changes in auditor risk assessments and sample size decisions based on information gathered from three large accounting firms for audits during 1994 and 1999. The five-year interval between data collection periods allows us to measure changes in risk assessments and sample sizes between the two periods. Auditors relied on controls and assessed inherent risk below the maximum on most audits, and were more likely to do so in the later period, consistent with a trend of lower risk assessment levels. Average sample sizes declined between 1994 and 1999 for the firms that had larger sample sizes in the earlier period. Overall, we find a significant relationship between inherent risk assessments and sample sizes, but this relationship is stronger in the earlier period and is not significant for all firms, especially in the later period. We find limited evidence of a relationship between control risk and sample sizes.


2015 ◽  
Vol 27 (1) ◽  
pp. 114-125 ◽  
Author(s):  
BC Tai ◽  
ZJ Chen ◽  
D Machin

In designing randomised clinical trials involving competing risks endpoints, it is important to consider competing events to ensure appropriate determination of sample size. We conduct a simulation study to compare sample sizes obtained from the cause-specific hazard and cumulative incidence (CMI) approaches, by first assuming exponential event times. As the proportional subdistribution hazard assumption does not hold for the CMI exponential (CMIExponential) model, we further investigate the impact of violation of such an assumption by comparing the results obtained from the CMI exponential model with those of a CMI model assuming a Gompertz distribution (CMIGompertz) where the proportional assumption is tenable. The simulation suggests that the CMIExponential approach requires a considerably larger sample size when treatment reduces the hazards of both the main event, A, and the competing risk, B. When treatment has a beneficial effect on A but no effect on B, the sample sizes required by both methods are largely similar, especially for large reduction in the main risk. If treatment has a protective effect on A but adversely affects B, then the sample size required by CMIExponential is notably smaller than cause-specific hazard for small to moderate reduction in the main risk. Further, a smaller sample size is required for CMIGompertz as compared with CMIExponential. The choice between a cause-specific hazard or CMI model in competing risks outcomes has implications on the study design. This should be made on the basis of the clinical question of interest and the validity of the associated model assumption.


2013 ◽  
Vol 37 (4) ◽  
pp. 383-392 ◽  
Author(s):  
Karla J. Lindquist ◽  
Eric Jorgenson ◽  
Thomas J. Hoffmann ◽  
John S. Witte

Sign in / Sign up

Export Citation Format

Share Document