scholarly journals Toward Leveraging Human Connectomic Data in Large Consortia: Generalizability of fMRI-Based Brain Graphs Across Sites, Sessions, and Paradigms

2018 ◽  
Vol 29 (3) ◽  
pp. 1263-1279 ◽  
Author(s):  
Hengyi Cao ◽  
Sarah C McEwen ◽  
Jennifer K Forsyth ◽  
Dylan G Gee ◽  
Carrie E Bearden ◽  
...  

Abstract While graph theoretical modeling has dramatically advanced our understanding of complex brain systems, the feasibility of aggregating connectomic data in large imaging consortia remains unclear. Here, using a battery of cognitive, emotional and resting fMRI paradigms, we investigated the generalizability of functional connectomic measures across sites and sessions. Our results revealed overall fair to excellent reliability for a majority of measures during both rest and tasks, in particular for those quantifying connectivity strength, network segregation and network integration. Processing schemes such as node definition and global signal regression (GSR) significantly affected resulting reliability, with higher reliability detected for the Power atlas (vs. AAL atlas) and data without GSR. While network diagnostics for default-mode and sensori-motor systems were consistently reliable independently of paradigm, those for higher-order cognitive systems were reliable predominantly when challenged by task. In addition, based on our present sample and after accounting for observed reliability, satisfactory statistical power can be achieved in multisite research with sample size of approximately 250 when the effect size is moderate or larger. Our findings provide empirical evidence for the generalizability of brain functional graphs in large consortia, and encourage the aggregation of connectomic measures using multisite and multisession data.

2019 ◽  
Author(s):  
Rob Cribbie ◽  
Nataly Beribisky ◽  
Udi Alter

Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is often difficult but unavoidable for conducting the procedure properly, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact psychology journals in 2016 and 2017. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.


2021 ◽  
Vol 3 (1) ◽  
pp. 61-89
Author(s):  
Stefan Geiß

Abstract This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.


2020 ◽  
pp. 28-63
Author(s):  
A. G. Vinogradov

The article belongs to a special modern genre of scholar publications, so-called tutorials – articles devoted to the application of the latest methods of design, modeling or analysis in an accessible format in order to disseminate best practices. The article acquaints Ukrainian psychologists with the basics of using the R programming language to the analysis of empirical research data. The article discusses the current state of world psychology in connection with the Crisis of Confidence, which arose due to the low reproducibility of empirical research. This problem is caused by poor quality of psychological measurement tools, insufficient attention to adequate sample planning, typical statistical hypothesis testing practices, and so-called “questionable research practices.” The tutorial demonstrates methods for determining the sample size depending on the expected magnitude of the effect size and desired statistical power, performing basic variable transformations and statistical analysis of psychological research data using language and environment R. The tutorial presents minimal system of R functions required to carry out: modern analysis of reliability of measurement scales, sample size calculation, point and interval estimation of effect size for four the most widespread in psychology designs for the analysis of two variables’ interdependence. These typical problems include finding the differences between the means and variances in two or more samples, correlations between continuous and categorical variables. Practical information on data preparation, import, basic transformations, and application of basic statistical methods in the cloud version of RStudio is provided.


2007 ◽  
Vol 25 (18_suppl) ◽  
pp. 6516-6516
Author(s):  
P. Bedard ◽  
M. K. Krzyzanowska ◽  
M. Pintilie ◽  
I. F. Tannock

6516 Background: Underpowered randomized clinical trials (RCTs) may expose participants to risks and burdens of research without scientific merit. We investigated the prevalence of underpowered RCTs presented at ASCO annual meetings. Methods: We surveyed all two-arm parallel phase III RCTs presented at the ASCO annual meeting from 1995–2003 where differences for the primary endpoint were non-statistically significant. Post hoc calculations were performed using a power of 80% and a=0.05 (two-sided) to determine the sample size required to detect a small, medium, and large effect size between the two groups. For studies reporting a proportion or time to event as a primary endpoint, effect size was expressed as an odds ratio (OR) or hazard ratio (HR) respectively, with a small effect size defined as OR/HR=1.3, medium effect size OR/HR=1.5, and large effect OR/HR=2.0. Logistic regression was used to identify factors associated with lack of statistical power. Results: Of 423 negative RCTs for which post hoc sample size calculations could be performed, 45 (10.6%), 138 (32.6%), and 333 (78.7%) had adequate sample size to detect small, medium, and large effect sizes respectively. Only 35 negative RCTs (7.1%) reported a reason for inadequate sample size. In a multivariable model, studies presented at plenary or oral sessions (p<0.0001) and multicenter studies supported by a co-operative group were more likely to have adequate sample size (p<0.0001). Conclusion: Two-thirds of negative RCTs presented at the ASCO annual meeting do not have an adequate sample to detect a medium-sized treatment effect. Most underpowered negative RCTs do not report a sample size calculation or reasons for inadequate patient accrual. No significant financial relationships to disclose.


Scientifica ◽  
2016 ◽  
Vol 2016 ◽  
pp. 1-5 ◽  
Author(s):  
R. Eric Heidel

Statistical power is the ability to detect a significant effect, given that the effect actually exists in a population. Like most statistical concepts, statistical power tends to induce cognitive dissonance in hepatology researchers. However, planning for statistical power by ana priorisample size calculation is of paramount importance when designing a research study. There are five specific empirical components that make up ana priorisample size calculation: the scale of measurement of the outcome, the research design, the magnitude of the effect size, the variance of the effect size, and the sample size. A framework grounded in the phenomenon of isomorphism, or interdependencies amongst different constructs with similar forms, will be presented to understand the isomorphic effects of decisions made on each of the five aforementioned components of statistical power.


2021 ◽  
Author(s):  
Nick J. Broers ◽  
Henry Otgaar

Since the early work of Cohen (1962) psychological researchers have become aware of the importance of doing a power analysis to ensure that the predicted effect will be detectable with sufficient statistical power. APA guidelines require researchers to provide a justification of the chosen sample size with reference to the expected effect size; an expectation that should be based on previous research. However, we argue that a credible estimate of an expected effect size is only reasonable under two conditions: either the new study forms a direct replication of earlier work or the outcome scale makes use of meaningful and familiar units that allow for the quantification of a minimal effect of psychological interest. In practice neither of these conditions is usually met. We propose a different rationale for a power analysis that will ensure that researchers will be able to justify their sample size as meaningful and adequate.


2017 ◽  
Author(s):  
Clarissa F. D. Carneiro ◽  
Thiago C. Moulin ◽  
Malcolm R. Macleod ◽  
Olavo B. Amaral

AbstractProposals to increase research reproducibility frequently call for focusing on effect sizes instead of p values, as well as for increasing the statistical power of experiments. However, it is unclear to what extent these two concepts are indeed taken into account in basic biomedical science. To study this in a real-case scenario, we performed a systematic review of effect sizes and statistical power in studies on learning of rodent fear conditioning, a widely used behavioral task to evaluate memory. Our search criteria yielded 410 experiments comparing control and treated groups in 122 articles. Interventions had a mean effect size of 29.5%, and amnesia caused by memory-impairing interventions was nearly always partial. Mean statistical power to detect the average effect size observed in well-powered experiments with significant differences (37.2%) was 65%, and was lower among studies with non-significant results. Only one article reported a sample size calculation, and our estimated sample size to achieve 80% power considering typical effect sizes and variances (15 animals per group) was reached in only 12.2% of experiments. Actual effect sizes correlated with effect size inferences made by readers on the basis of textual descriptions of results only when findings were non-significant, and neither effect size nor power correlated with study quality indicators, number of citations or impact factor of the publishing journal. In summary, effect sizes and statistical power have a wide distribution in the rodent fear conditioning literature, but do not seem to have a large influence on how results are described or cited. Failure to take these concepts into consideration might limit attempts to improve reproducibility in this field of science.


Author(s):  
David Clark-Carter

This chapter explores why effect size needs to be taken into account when designing and reporting research. It gives an effect size for each of the standard statistical tests which health and clinical psychologists employ, and looks at the need to consider statistical power when choosing a sample size for a study and how statistical power can help to guide the advice which can be given when discussing future research.


2012 ◽  
Vol 60 (6) ◽  
pp. 381 ◽  
Author(s):  
Evan Watkins ◽  
Julian Di Stefano

Hypotheses relating to the annual frequency distribution of mammalian births are commonly tested using a goodness-of-fit procedure. Several interacting factors influence the statistical power of these tests, but no power studies have been conducted using scenarios derived from biological hypotheses. Corresponding to theories relating reproductive output to seasonal resource fluctuation, we simulated data reflecting a winter reduction in birth frequency to test the effect of four factors (sample size, maximum effect size, the temporal pattern of response and the number of categories used for analysis) on the power of three goodness-of-fit procedures – the G and Chi-square tests and Watson’s U2 test. Analyses resulting in high power all had a large maximum effect size (60%) and were associated with a sample size of 200 on most occasions. The G-test was the most powerful when data were analysed using two temporal categories (winter and other) while Watson’s U2 test achieved the highest power when 12 monthly categories were used. Overall, the power of most modelled scenarios was low. Consequently, we recommend using power analysis as a research planning tool, and have provided a spreadsheet enabling a priori power calculations for the three tests considered.


Sign in / Sign up

Export Citation Format

Share Document