Calibrate your confidence in research findings: A tutorial on improving research methods and practices

Journal of Pacific Rim Psychology ◽

10.1017/prp.2020.7 ◽

2020 ◽

Vol 14 ◽

Cited By ~ 5

Author(s):

Aline da Silva Frost ◽

Alison Ledgerwood

Keyword(s):

Research Methods ◽

Statistical Power ◽

Type I Error ◽

Effect Sizes ◽

Online Media ◽

Type I ◽

Journal Articles ◽

Psychological Science ◽

Different Types ◽

Research Findings

Abstract This article provides an accessible tutorial with concrete guidance for how to start improving research methods and practices in your lab. Following recent calls to improve research methods and practices within and beyond the borders of psychological science, resources have proliferated across book chapters, journal articles, and online media. Many researchers are interested in learning more about cutting-edge methods and practices but are unsure where to begin. In this tutorial, we describe specific tools that help researchers calibrate their confidence in a given set of findings. In Part I, we describe strategies for assessing the likely statistical power of a study, including when and how to conduct different types of power calculations, how to estimate effect sizes, and how to think about power for detecting interactions. In Part II, we provide strategies for assessing the likely type I error rate of a study, including distinguishing clearly between data-independent (“confirmatory”) and data-dependent (“exploratory”) analyses and thinking carefully about different forms and functions of preregistration.

Download Full-text

Calibrate Your Confidence in Research Findings: A Tutorial on Improving Research Methods and Practices

10.31234/osf.io/6uxkb ◽

2020 ◽

Author(s):

Aline da Silva Frost ◽

Alison Ledgerwood

Keyword(s):

Research Methods ◽

Statistical Power ◽

Type I Error ◽

Effect Sizes ◽

Online Media ◽

Type I ◽

Journal Articles ◽

Psychological Science ◽

Different Types ◽

Research Findings

This article provides an accessible tutorial with concrete guidance for how to start improving research methods and practices in your lab. Following recent calls to improve research methods and practices within and beyond the borders of psychological science, resources have proliferated across book chapters, journal articles, and online media. Many researchers are interested in learning more about cutting-edge methods and practices, but are unsure where to begin. In this tutorial, we describe specific tools that help researchers calibrate their confidence in a given set of findings. In Part I, we describe strategies for assessing the likely statistical power of a study, including when and how to conduct different types of power calculations, how to estimate effect sizes, and how to think about power for detecting interactions. In Part II, we provide strategies for assessing the likely Type I error rate of a study, including distinguishing clearly between data-independent (“confirmatory”) and data-dependent (“exploratory”) analyses and thinking carefully about different forms and functions of preregistration.

Download Full-text

Is It Really Robust?

Methodology ◽

10.1027/1614-2241/a000016 ◽

2010 ◽

Vol 6 (4) ◽

pp. 147-151 ◽

Cited By ~ 385

Author(s):

Emanuel Schmider ◽

Matthias Ziegler ◽

Erik Danay ◽

Luzi Beyer ◽

Markus Bühner

Keyword(s):

Goodness Of Fit ◽

Type I Error ◽

Effect Sizes ◽

Random Numbers ◽

Type I ◽

Type Ii ◽

Type Ii Error ◽

Normality Assumption ◽

Different Types ◽

Factor Type

Empirical evidence to the robustness of the analysis of variance (ANOVA) concerning violation of the normality assumption is presented by means of Monte Carlo methods. High-quality samples underlying normally, rectangularly, and exponentially distributed basic populations are created by drawing samples which consist of random numbers from respective generators, checking their goodness of fit, and allowing only the best 10% to take part in the investigation. A one-way fixed-effect design with three groups of 25 values each is chosen. Effect-sizes are implemented in the samples and varied over a broad range. Comparing the outcomes of the ANOVA calculations for the different types of distributions, gives reason to regard the ANOVA as robust. Both, the empirical type I error α and the empirical type II error β remain constant under violation. Moreover, regression analysis identifies the factor “type of distribution” as not significant in explanation of the ANOVA results.

Download Full-text

A Meta-Meta-Analysis: Empirical Review of Statistical Power, Type I Error Rates, Effect Sizes, and Model Selection of Meta-Analyses Published in Psychology

Multivariate Behavioral Research ◽

10.1080/00273171003680187 ◽

2010 ◽

Vol 45 (2) ◽

pp. 239-270 ◽

Cited By ~ 46

Author(s):

Guy Cafri ◽

Jeffrey D. Kromrey ◽

Michael T. Brannick

Keyword(s):

Statistical Power ◽

Type I Error ◽

Meta Analysis ◽

Error Rates ◽

Effect Sizes ◽

Type I ◽

Power Type ◽

Type I Error Rates ◽

Meta Analyses ◽

Selection Of

Download Full-text

How to Detect Publication Bias in Psychological Research

Zeitschrift für Psychologie ◽

10.1027/2151-2604/a000386 ◽

2019 ◽

Vol 227 (4) ◽

pp. 261-279 ◽

Cited By ~ 2

Author(s):

Frank Renkewitz ◽

Melanie Keiner

Keyword(s):

Publication Bias ◽

Effect Size ◽

Statistical Power ◽

Type I Error ◽

Psychological Research ◽

Type I ◽

True Effect Size ◽

Questionable Research Practices ◽

True Effect ◽

Meta Analyses

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.

Download Full-text

Supplemental Material for Meta-Analysis to Integrate Effect Sizes Within an Article: Possible Misuse and Type I Error Inflation

Journal of Experimental Psychology General ◽

10.1037/xge0000159.supp ◽

2016 ◽

Keyword(s):

Type I Error ◽

Meta Analysis ◽

Effect Sizes ◽

Type I

Download Full-text

A Multi-faceted Mess: A Review of Statistical Power Analysis in Psychology Journal Articles

10.31234/osf.io/3bdfu ◽

2019 ◽

Cited By ~ 2

Author(s):

Rob Cribbie ◽

Nataly Beribisky ◽

Udi Alter

Keyword(s):

Sample Size ◽

Effect Size ◽

Power Analysis ◽

Statistical Power ◽

Type I Error ◽

A Priori ◽

Type I ◽

Specific Level ◽

Maximum Sample Size ◽

Power Analyses

Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is often difficult but unavoidable for conducting the procedure properly, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact psychology journals in 2016 and 2017. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.

Download Full-text

Cognitive tests used in chronic adult human randomised controlled trial micronutrient and phytochemical intervention studies

Nutrition Research Reviews ◽

10.1017/s0954422410000119 ◽

2010 ◽

Vol 23 (2) ◽

pp. 200-229 ◽

Cited By ~ 25

Author(s):

Anna L. Macready ◽

Laurie T. Butler ◽

Orla B. Kennedy ◽

Judi A. Ellis ◽

Claire M. Williams ◽

...

Keyword(s):

Randomised Controlled Trial ◽

Statistical Power ◽

Type I Error ◽

Spatial Working Memory ◽

Controlled Trial ◽

Type I ◽

Cognitive Tests ◽

Cognitive Domains ◽

Positive Effects ◽

Randomised Controlled

In recent years there has been a rapid growth of interest in exploring the relationship between nutritional therapies and the maintenance of cognitive function in adulthood. Emerging evidence reveals an increasingly complex picture with respect to the benefits of various food constituents on learning, memory and psychomotor function in adults. However, to date, there has been little consensus in human studies on the range of cognitive domains to be tested or the particular tests to be employed. To illustrate the potential difficulties that this poses, we conducted a systematic review of existing human adult randomised controlled trial (RCT) studies that have investigated the effects of 24 d to 36 months of supplementation with flavonoids and micronutrients on cognitive performance. There were thirty-nine studies employing a total of 121 different cognitive tasks that met the criteria for inclusion. Results showed that less than half of these studies reported positive effects of treatment, with some important cognitive domains either under-represented or not explored at all. Although there was some evidence of sensitivity to nutritional supplementation in a number of domains (for example, executive function, spatial working memory), interpretation is currently difficult given the prevailing ‘scattergun approach’ for selecting cognitive tests. Specifically, the practice means that it is often difficult to distinguish between a boundary condition for a particular nutrient and a lack of task sensitivity. We argue that for significant future progress to be made, researchers need to pay much closer attention to existing human RCT and animal data, as well as to more basic issues surrounding task sensitivity, statistical power and type I error.

Download Full-text

Required sample size for comparing two independent means

Marine Medicine ◽

10.22328/2413-5747-2020-6-2-106-113 ◽

2020 ◽

Vol 6 (2) ◽

pp. 106-113

Author(s):

A. M. Grjibovski ◽

M. A. Gorbatova ◽

A. N. Narkevich ◽

K. A. Vinogradov

Keyword(s):

Sample Size ◽

Statistical Power ◽

Type I Error ◽

Sample Size Calculation ◽

Biomedical Literature ◽

Type I ◽

Research Practice ◽

False Null Hypothesis ◽

Different Levels ◽

Russian Research

Sample size calculation in a planning phase is still uncommon in Russian research practice. This situation threatens validity of the conclusions and may introduce Type I error when the false null hypothesis is accepted due to lack of statistical power to detect the existing difference between the means. Comparing two means using unpaired Students’ ttests is the most common statistical procedure in the Russian biomedical literature. However, calculations of the minimal required sample size or retrospective calculation of the statistical power were observed only in very few publications. In this paper we demonstrate how to calculate required sample size for comparing means in unpaired samples using WinPepi and Stata software. In addition, we produced tables for minimal required sample size for studies when two means have to be compared and body mass index and blood pressure are the variables of interest. The tables were constructed for unpaired samples for different levels of statistical power and standard deviations obtained from the literature.

Download Full-text

Bayesian Two-Stage Adaptive Design in Bioequivalence

The International Journal of Biostatistics ◽

10.1515/ijb-2018-0105 ◽

2019 ◽

Vol 16 (1) ◽

Cited By ~ 2

Author(s):

Shengjie Liu ◽

Jun Gao ◽

Yuling Zheng ◽

Lei Huang ◽

Fangrong Yan

Keyword(s):

Statistical Power ◽

Adaptive Design ◽

Type I Error ◽

Probability Model ◽

Type I ◽

Two Stage ◽

Stage Design ◽

Estimation Strategy ◽

Drug Products ◽

Two Stage Design

AbstractBioequivalence (BE) studies are an integral component of new drug development process, and play an important role in approval and marketing of generic drug products. However, existing design and evaluation methods are basically under the framework of frequentist theory, while few implements Bayesian ideas. Based on the bioequivalence predictive probability model and sample re-estimation strategy, we propose a new Bayesian two-stage adaptive design and explore its application in bioequivalence testing. The new design differs from existing two-stage design (such as Potvin’s method B, C) in the following aspects. First, it not only incorporates historical information and expert information, but further combines experimental data flexibly to aid decision-making. Secondly, its sample re-estimation strategy is based on the ratio of the information in interim analysis to total information, which is simpler in calculation than the Potvin’s method. Simulation results manifested that the two-stage design can be combined with various stop boundary functions, and the results are different. Moreover, the proposed method saves sample size compared to the Potvin’s method under the conditions that type I error rate is below 0.05 and statistical power reaches 80 %.

Download Full-text

Exploring Research-Methods Blogs in Psychology: Who Posts What About Whom, and With What Effect?

Perspectives on Psychological Science ◽

10.1177/1745691619835216 ◽

2019 ◽

Vol 14 (4) ◽

pp. 691-704

Author(s):

Gandalf Nicolas ◽

Xuechunzi Bai ◽

Susan T. Fiske

Keyword(s):

Social Media ◽

Research Methods ◽

Statistical Power ◽

Scientific Practices ◽

Equity Issues ◽

Media Impact ◽

Scientific Discussion ◽

Male Individual ◽

Research Findings ◽

Historical Moment

During the methods crisis in psychology and other sciences, much discussion developed online in forums such as blogs and other social media. Hence, this increasingly popular channel of scientific discussion itself needs to be explored to inform current controversies, record the historical moment, improve methods communication, and address equity issues. Who posts what about whom, and with what effect? Does a particular generation or gender contribute more than another? Do blogs focus narrowly on methods, or do they cover a range of issues? How do they discuss individual researchers, and how do readers respond? What are some impacts? Web-scraping and text-analysis techniques provide a snapshot characterizing 41 current research-methods blogs in psychology. Bloggers mostly represented psychology’s traditional leaderships’ demographic categories: primarily male, mid- to late career, associated with American institutions, White, and with established citation counts. As methods blogs, their posts mainly concern statistics, replication (particularly statistical power), and research findings. The few posts that mentioned individual researchers substantially focused on replication issues; they received more views, social-media impact, comments, and citations. Male individual researchers were mentioned much more often than female researchers. Further data can inform perspectives about these new channels of scientific communication, with the shared aim of improving scientific practices.

Download Full-text