scholarly journals A systematic review of sample size and power in leading neuroscience journals

2017 ◽  
Author(s):  
Alice Carter ◽  
Kate Tilling ◽  
Marcus R Munafò

AbstractAdequate sample size is key to reproducible research findings: low statistical power can increase the probability that a statistically significant result is a false positive. Journals are increasingly adopting methods to tackle issues of reproducibility, such as by introducing reporting checklists. We conducted a systematic review comparing articles submitted to Nature Neuroscience in the 3 months prior to checklists (n=36) that were subsequently published with articles submitted to Nature Neuroscience in the 3 months immediately after checklists (n=45), along with a comparison journal Neuroscience in this same 3-month period (n=123). We found that although the proportion of studies commenting on sample sizes increased after checklists (22% vs 53%), the proportion reporting formal power calculations decreased (14% vs 9%). Using sample size calculations for 80% power and a significance level of 5%, we found little evidence that sample sizes were adequate to achieve this level of statistical power, even for large effect sizes. Our analysis suggests that reporting checklists may not improve the use and reporting of formal power calculations.

2020 ◽  
Vol 16 (1) ◽  
Author(s):  
Nader Salari ◽  
Habibolah Khazaie ◽  
Amin Hosseinian-Far ◽  
Hooman Ghasemi ◽  
Masoud Mohammadi ◽  
...  

Abstract Background In all epidemics, healthcare staff are at the centre of risks and damages caused by pathogens. Today, nurses and physicians are faced with unprecedented work pressures in the face of the COVID-19 pandemic, resulting in several psychological disorders such as stress, anxiety and sleep disturbances. The aim of this study is to investigate the prevalence of sleep disturbances in hospital nurses and physicians facing the COVID-19 patients. Method A systematic review and metanalysis was conducted in accordance with the PRISMA criteria. The PubMed, Scopus, Science direct, Web of science, CINHAL, Medline, and Google Scholar databases were searched with no lower time-limt and until 24 June 2020. The heterogeneity of the studies was measured using I2 test and the publication bias was assessed by the Egger’s test at the significance level of 0.05. Results The I2 test was used to evaluate the heterogeneity of the selected studies, based on the results of I2 test, the prevalence of sleep disturbances in nurses and physicians is I2: 97.4% and I2: 97.3% respectively. After following the systematic review processes, 7 cross-sectional studies were selected for meta-analysis. Six studies with the sample size of 3745 nurses were examined in and the prevalence of sleep disturbances was approximated to be 34.8% (95% CI: 24.8-46.4%). The prevalence of sleep disturbances in physicians was also measured in 5 studies with the sample size of 2123 physicians. According to the results, the prevalence of sleep disturbances in physicians caring for the COVID-19 patients was reported to be 41.6% (95% CI: 27.7-57%). Conclusion Healthcare workers, as the front line of the fight against COVID-19, are more vulnerable to the harmful effects of this disease than other groups in society. Increasing workplace stress increases sleep disturbances in the medical staff, especially nurses and physicians. In other words, increased stress due to the exposure to COVID-19 increases the prevalence of sleep disturbances in nurses and physicians. Therefore, it is important for health policymakers to provide solutions and interventions to reduce the workplace stress and pressures on medical staff.


2019 ◽  
Author(s):  
Peter E Clayson ◽  
Kaylie Amanda Carbine ◽  
Scott Baldwin ◽  
Michael J. Larson

Methodological reporting guidelines for studies of event-related potentials (ERPs) were updated in Psychophysiology in 2014. These guidelines facilitate the communication of key methodological parameters (e.g., preprocessing steps). Failing to report key parameters represents a barrier to replication efforts, and difficultly with replicability increases in the presence of small sample sizes and low statistical power. We assessed whether guidelines are followed and estimated the average sample size and power in recent research. Reporting behavior, sample sizes, and statistical designs were coded for 150 randomly-sampled articles from five high-impact journals that frequently publish ERP research from 2011 to 2017. An average of 63% of guidelines were reported, and reporting behavior was similar across journals, suggesting that gaps in reporting is a shortcoming of the field rather than any specific journal. Publication of the guidelines paper had no impact on reporting behavior, suggesting that editors and peer reviewers are not enforcing these recommendations. The average sample size per group was 21. Statistical power was conservatively estimated as .72-.98 for a large effect size, .35-.73 for a medium effect, and .10-.18 for a small effect. These findings indicate that failing to report key guidelines is ubiquitous and that ERP studies are primarily powered to detect large effects. Such low power and insufficient following of reporting guidelines represent substantial barriers to replication efforts. The methodological transparency and replicability of studies can be improved by the open sharing of processing code and experimental tasks and by a priori sample size calculations to ensure adequately powered studies.


2021 ◽  
Vol 3 (1) ◽  
pp. 61-89
Author(s):  
Stefan Geiß

Abstract This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.


2020 ◽  
Author(s):  
Chia-Lung Shih ◽  
Te-Yu Hung

Abstract Background A small sample size (n < 30 for each treatment group) is usually enrolled to investigate the differences in efficacy between treatments for knee osteoarthritis (OA). The objective of this study was to use simulation for comparing the power of four statistical methods for analysis of small sample size for detecting the differences in efficacy between two treatments for knee OA. Methods A total of 10,000 replicates of 5 sample sizes (n=10, 15, 20, 25, and 30 for each group) were generated based on the previous reported measures of treatment efficacy. Four statistical methods were used to compare the differences in efficacy between treatments, including the two-sample t-test (t-test), the Mann-Whitney U-test (M-W test), the Kolmogorov-Smirnov test (K-S test), and the permutation test (perm-test). Results The bias of simulated parameter means showed a decreased trend with sample size but the CV% of simulated parameter means varied with sample sizes for all parameters. For the largest sample size (n=30), the CV% could achieve a small level (<20%) for almost all parameters but the bias could not. Among the non-parametric tests for analysis of small sample size, the perm-test had the highest statistical power, and its false positive rate was not affected by sample size. However, the power of the perm-test could not achieve a high value (80%) even using the largest sample size (n=30). Conclusion The perm-test is suggested for analysis of small sample size to compare the differences in efficacy between two treatments for knee OA.


2021 ◽  
Author(s):  
Alice Carter ◽  
Kate Tilling ◽  
Marcus Robert Munafo

The sample size of a study is a key design and planning consideration. However, sample size and power calculations are often either poorly reported or not reported at all, which suggests they may not form a routine part of study planning. Inadequate understanding of sample size and statistical power can result in poor quality studies. Journals increasingly require a justification of sample size, for example through the use of reporting checklists. However, for meaningful improvements in research quality to be made, researchers need to consider sample size and power at the design stage of a study, rather than at the publication stage. Here we briefly illustrate sample size and statistical power in the context of different research questions and how they should be viewed as a critical design consideration.


2021 ◽  
Author(s):  
Benjamin J Burgess ◽  
Michelle C Jackson ◽  
David J Murrell

1. Most ecosystems are subject to co-occurring, anthropogenically driven changes and understanding how these multiple stressors interact is a pressing concern. Stressor interactions are typically studied using null models, with the additive and multiplicative null expectation being those most widely applied. Such approaches classify interactions as being synergistic, antagonistic, reversal, or indistinguishable from the null expectation. Despite their wide-spread use, there has been no thorough analysis of these null models, nor a systematic test of the robustness of their results to sample size or sampling error in the estimates of the responses to stressors. 2. We use data simulated from food web models where the true stressor interactions are known, and analytical results based on the null model equations to uncover how (i) sample size, (ii) variation in biological responses to the stressors and (iii) statistical significance, affect the ability to detect non-null interactions. 3. Our analyses lead to three main results. Firstly, it is clear the additive and multiplicative null models are not directly comparable, and over one third of all simulated interactions had classifications that were model dependent. Secondly, both null models have weak power to correctly classify interactions at commonly implemented sample sizes (i.e., ≤6 replicates), unless data uncertainty is unrealistically low. This means all but the most extreme interactions are indistinguishable from the null model expectation. Thirdly, we show that increasing sample size increases the power to detect the true interactions but only very slowly. However, the biggest gains come from increasing replicates from 3 up to 25 and we provide an R function for users to determine sample sizes required to detect a critical effect size of biological interest for the additive model. 4. Our results will aid researchers in the design of their experiments and the subsequent interpretation of results. We find no clear statistical advantage of using one null model over the other and argue null model choice should be based on biological relevance rather than statistical properties. However, there is a pressing need to increase experiment sample sizes otherwise many biologically important synergistic and antagonistic stressor interactions will continue to be missed.


Background: Emotional intelligence (EI) involves a combination of competencies which allow a person to be aware of, to understand the emotions of others and to use this knowledge to foster their and others success. Objective: This study aims to provide a systematic review of published researches on the emotional intelligence among women. Methodology: To achieve this aims, papers were selected in January 2019 with search terms “Emotional intelligence” “Emotional intelligence among women” and “Gender in emotional intelligence” from five databases: Scopus, PsycINFO, Springer, Google Scholar, and ScienceDirect. 26 research based article were evaluated published from 2010 to 2018. Results: The analysis the published articles considered two basic central themes in the study of emotional intelligence among women: level of women emotional intelligence and emotional intelligence attributes. The study revealed the level of emotional intelligence is low, meaning that women are less emotionally intelligent. Similarly, the study identified ten (10) emotional intelligence attributes among women which include empathy, social responsibility, stress tolerance, emotional self-awareness, emotional expression, independence, flexibility, problem solving, impulse control, interpersonal relationships and optimism. Conclusions: Considering the results of various studies analysed in this review clearly, identified methodological weakness in emotional intelligence study such as sample size most of the studies but only few studies recognised and report the limitation. Similarly, none of these studies investigate the causes of this low state of emotional intelligence among women. The findings add to the growing empirical evidence regarding emotional intelligence. Future researches should look into some of these limitation and address sample size challenges, for batter generalization of research findings, sample size should always be considered. Future work should as well examine in more detail the role of cognitive or other factors in determining the emotional intelligence among women


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Nader Salari ◽  
Hooman Ghasemi ◽  
Loghman Mohammadi ◽  
Mohammad hasan Behzadi ◽  
Elham Rabieenia ◽  
...  

Abstract Background Osteoporosis affects all sections of society, including families with people affected by osteoporosis, government agencies and medical institutes in various fields. For example, it involves the patient and his/her family members, and government agencies in terms of the cost of treatment and medical care. Providing a comprehensive picture of the prevalence of osteoporosis globally is important for health policymakers to make appropriate decisions. Therefore, this study was conducted to investigate the prevalence of osteoporosis worldwide. Methods A systematic review and meta-analysis were conducted in accordance with the PRISMA criteria. The PubMed, Science Direct, Web of Science, Scopus, Magiran, and Google Scholar databases were searched with no lower time limit up till 26 August 2020. The heterogeneity of the studies was measured using the I2 test, and the publication bias was assessed by the Begg and Mazumdar’s test at the significance level of 0.1. Results After following the systematic review processes, 86 studies were selected for meta-analysis. The sample size of the study was 103,334,579 people in the age range of 15–105 years. Using meta-analysis, the prevalence of osteoporosis in the world was reported to be 18.3 (95% CI 16.2–20.7). Based on 70 studies and sample size of 800,457 women, and heterogenicity I2: 99.8, the prevalence of osteoporosis in women of the world was reported to be 23.1 (95% CI 19.8–26.9), while the prevalence of osteoporosis among men of the world was found to be 11.7 (95% CI 9.6–14.1 which was based on 40 studies and sample size of 453,964 men.). The highest prevalence of osteoporosis was reported in Africa with 39.5% (95% CI 22.3–59.7) and a sample size of 2989 people with the age range 18–95 years. Conclusion According to the medical, economic, and social burden of osteoporosis, providing a robust and comprehensive estimate of the prevalence of osteoporosis in the world can facilitate decisions in health system planning and policymaking, including an overview of the current and outlook for the future; provide the necessary facilities for the treatment of people with osteoporosis; reduce the severe risks that lead to death by preventing fractures; and, finally, monitor the overall state of osteoporosis in the world. This study is the first to report a structured review and meta-analysis of the prevalence of osteoporosis worldwide.


2017 ◽  
Author(s):  
Clarissa F. D. Carneiro ◽  
Thiago C. Moulin ◽  
Malcolm R. Macleod ◽  
Olavo B. Amaral

AbstractProposals to increase research reproducibility frequently call for focusing on effect sizes instead of p values, as well as for increasing the statistical power of experiments. However, it is unclear to what extent these two concepts are indeed taken into account in basic biomedical science. To study this in a real-case scenario, we performed a systematic review of effect sizes and statistical power in studies on learning of rodent fear conditioning, a widely used behavioral task to evaluate memory. Our search criteria yielded 410 experiments comparing control and treated groups in 122 articles. Interventions had a mean effect size of 29.5%, and amnesia caused by memory-impairing interventions was nearly always partial. Mean statistical power to detect the average effect size observed in well-powered experiments with significant differences (37.2%) was 65%, and was lower among studies with non-significant results. Only one article reported a sample size calculation, and our estimated sample size to achieve 80% power considering typical effect sizes and variances (15 animals per group) was reached in only 12.2% of experiments. Actual effect sizes correlated with effect size inferences made by readers on the basis of textual descriptions of results only when findings were non-significant, and neither effect size nor power correlated with study quality indicators, number of citations or impact factor of the publishing journal. In summary, effect sizes and statistical power have a wide distribution in the rodent fear conditioning literature, but do not seem to have a large influence on how results are described or cited. Failure to take these concepts into consideration might limit attempts to improve reproducibility in this field of science.


Sign in / Sign up

Export Citation Format

Share Document