The P Value Line Dance: When Does the Music Stop? (Preprint)

Mapping Intimacies ◽

10.2196/preprints.21345 ◽

2020 ◽

Author(s):

Marcus Bendtsen

Keyword(s):

Decision Making ◽

Bayesian Methods ◽

Null Hypothesis ◽

Driving Force ◽

P Value ◽

Type I ◽

Sample Sizes ◽

Interim Analyses ◽

Back Seat ◽

Over Time

UNSTRUCTURED When should a trial stop? Such a seemingly innocent question evokes concerns of type I and II errors among those who believe that certainty can be the product of uncertainty and among researchers who have been told that they need to carefully calculate sample sizes, consider multiplicity, and not spend <i>P</i> values on interim analyses. However, the endeavor to dichotomize evidence into significant and nonsignificant has led to the basic driving force of science, namely uncertainty, to take a back seat. In this viewpoint we discuss that if testing the null hypothesis is the ultimate goal of science, then we need not worry about writing protocols, consider ethics, apply for funding, or run any experiments at all—all null hypotheses will be rejected at some point—everything has an effect. The job of science should be to unearth the uncertainties of the effects of treatments, not to test their difference from zero. We also show the fickleness of <i>P</i> values, how they may one day point to statistically significant results; and after a few more participants have been recruited, the once statistically significant effect suddenly disappears. We show plots which we hope would intuitively highlight that all assessments of evidence will fluctuate over time. Finally, we discuss the remedy in the form of Bayesian methods, where uncertainty leads; and which allows for continuous decision making to stop or continue recruitment, as new data from a trial is accumulated.

Download Full-text

The P Value Line Dance: When Does the Music Stop?

Journal of Medical Internet Research ◽

10.2196/21345 ◽

2020 ◽

Vol 22 (8) ◽

pp. e21345 ◽

Cited By ~ 3

Author(s):

Marcus Bendtsen

Keyword(s):

Decision Making ◽

Bayesian Methods ◽

Null Hypothesis ◽

Driving Force ◽

P Value ◽

Type I ◽

P Values ◽

Interim Analyses ◽

Back Seat ◽

Over Time

When should a trial stop? Such a seemingly innocent question evokes concerns of type I and II errors among those who believe that certainty can be the product of uncertainty and among researchers who have been told that they need to carefully calculate sample sizes, consider multiplicity, and not spend P values on interim analyses. However, the endeavor to dichotomize evidence into significant and nonsignificant has led to the basic driving force of science, namely uncertainty, to take a back seat. In this viewpoint we discuss that if testing the null hypothesis is the ultimate goal of science, then we need not worry about writing protocols, consider ethics, apply for funding, or run any experiments at all—all null hypotheses will be rejected at some point—everything has an effect. The job of science should be to unearth the uncertainties of the effects of treatments, not to test their difference from zero. We also show the fickleness of P values, how they may one day point to statistically significant results; and after a few more participants have been recruited, the once statistically significant effect suddenly disappears. We show plots which we hope would intuitively highlight that all assessments of evidence will fluctuate over time. Finally, we discuss the remedy in the form of Bayesian methods, where uncertainty leads; and which allows for continuous decision making to stop or continue recruitment, as new data from a trial is accumulated.

Download Full-text

When Studies are in Error: Basic Statistical Vocabulary Needed to Understand Clinical Studies

Journal of Cutaneous Medicine and Surgery ◽

10.1177/120347549600100108 ◽

1996 ◽

Vol 1 (1) ◽

pp. 25-28 ◽

Cited By ~ 1

Author(s):

Martin A. Weinstock

Keyword(s):

Null Hypothesis ◽

Statistical Power ◽

Critical Appraisal ◽

Type I Error ◽

Statistical Significance ◽

P Value ◽

Type I ◽

Type Ii ◽

Type Ii Error ◽

Error Type

Background: Accurate understanding of certain basic statistical terms and principles is key to critical appraisal of published literature. Objective: This review describes type I error, type II error, null hypothesis, p value, statistical significance, a, two-tailed and one-tailed tests, effect size, alternate hypothesis, statistical power, β, publication bias, confidence interval, standard error, and standard deviation, while including examples from reports of dermatologic studies. Conclusion: The application of the results of published studies to individual patients should be informed by an understanding of certain basic statistical concepts.

Download Full-text

Testing the effectiveness of principal components in adjusting for relatedness in genetic association studies

10.1101/858399 ◽

2019 ◽

Author(s):

Yiqi Yao ◽

Alejandro Ochoa

Keyword(s):

Population Structure ◽

Family Structure ◽

Genetic Association ◽

Principal Components ◽

Association Studies ◽

Hybrid Approach ◽

Genetic Association Studies ◽

P Value ◽

Type I ◽

Sample Sizes

AbstractModern genetic association studies require modeling population structure and family relatedness in order to calculate correct statistics. Principal Components Analysis (PCA) is one of the most common approaches for modeling this population structure, but nowadays the Linear Mixed-Effects Model (LMM) is believed by many to be a superior model. Remarkably, previous comparisons have been limited by testing PCA without varying the number of principal components (PCs), by simulating unrealistically simple population structures, and by not always measuring both type-I error control and predictive power. In this work, we thoroughly evaluate PCA with varying number of PCs alongside LMM in various realistic scenarios, including admixture together with family structure, measuring both null p-value uniformity and the area under the precision-recall curves. We find that PCA performs as well as LMM when enough PCs are used and the sample size is large, and find a remarkable robustness to extreme number of PCs. However, we notice decreased performance for PCA relative to LMM when sample sizes are small and when there is family structure, although LMM performance is highly variable. Altogether, our work suggests that PCA is a favorable approach for association studies when sample sizes are large and no close relatives exist in the data, and a hybrid approach of LMM with PCs may be the best of both worlds.

Download Full-text

Statistical Conclusion Validity

10.1093/oso/9780190661557.003.0006 ◽

2017 ◽

Author(s):

Richard McCleary ◽

David McDowall ◽

Bradley J. Bartos

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Hypothesis Test ◽

Model Misspecification ◽

Internal Validity ◽

Error Rates ◽

P Value ◽

Type I ◽

Null Hypothesis Testing ◽

Statistical Conclusion

Chapter 6 addresses the sub-category of internal validity defined by Shadish et al., as statistical conclusion validity, or “validity of inferences about the correlation (covariance) between treatment and outcome.” The common threats to statistical conclusion validity can arise, or become plausible through either model misspecification or through hypothesis testing. The risk of a serious model misspecification is inversely proportional to the length of the time series, for example, and so is the risk of mistating the Type I and Type II error rates. Threats to statistical conclusion validity arise from the classical and modern hybrid significance testing structures, the serious threats that weigh heavily in p-value tests are shown to be undefined in Beyesian tests. While the particularly vexing threats raised by modern null hypothesis testing are resolved through the elimination of the modern null hypothesis test, threats to statistical conclusion validity would inevitably persist and new threats would arise.

Download Full-text

Exact Tests of Zero Variance Component in Presence of Multiple Variance Components with Application to Longitudinal Microbiome Study

10.1101/281246 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jing Zhai ◽

Kenneth Knox ◽

Homer L. Twigg ◽

Hua Zhou ◽

Jin J. Zhou

Keyword(s):

Likelihood Ratio ◽

Likelihood Ratio Test ◽

Variance Components ◽

Null Hypothesis ◽

Variance Component ◽

Ratio Test ◽

Type I ◽

Exact Tests ◽

Sample Sizes ◽

Microbiome Composition

SummaryIn the metagenomics studies, testing the association of microbiome composition and clinical conditions translates to testing the nullity of variance components. Computationally efficient score tests have been the major tools. But they can only apply to the null hypothesis with a single variance component and when sample sizes are large. Therefore, they are not applicable to longitudinal microbiome studies. In this paper, we propose exact tests (score test, likelihood ratio test, and restricted likelihood ratio test) to solve the problems of (1) testing the association of the overall microbiome composition in a longitudinal design and (2) detecting the association of one specific microbiome cluster while adjusting for the effects from related clusters. Our approach combines the exact tests for null hypothesis with a single variance component with a strategy of reducing multiple variance components to a single one. Simulation studies demonstrate that our method has correct type I error rate and superior power compared to existing methods at small sample sizes and weak signals. Finally, we apply our method to a longitudinal pulmonary microbiome study of human immunodeficiency virus (HIV) infected patients and reveal two interesting genera Prevotella and Veillonella associated with forced vital capacity. Our findings shed lights on the impact of lung microbiome to HIV complexities. The method is implemented in the open source, high-performance computing language Julia and is freely available at https://github.com/JingZhai63/VCmicrobiome.

Download Full-text

Confronting p-hacking: addressing p-value dependence on sample size

10.1101/2019.12.17.878405 ◽

2019 ◽

Author(s):

Estibaliz Gómez-de-Mariscal ◽

Alexandra Sneider ◽

Hasini Jayatilaka ◽

Jude M. Phillip ◽

Denis Wirtz ◽

...

Keyword(s):

Decision Making ◽

Null Hypothesis ◽

Statistical Significance ◽

P Value ◽

P Values ◽

New Approach ◽

Minimum Data ◽

Large Scope ◽

Depth Study ◽

Decision Making Processes

ABSTRACTBiomedical research has come to rely on p-values to determine potential translational impact. The p-value is routinely compared with a threshold commonly set to 0.05 to assess the significance of the null hypothesis. Whenever a large enough dataset is available, this threshold is easily reachable. This phenomenon is known as p-hacking and it leads to spurious conclusions. Herein, we propose a systematic and easy-to-follow protocol that models the p-value as an exponential function to test the existence of real statistical significance. This new approach provides a robust assessment of the null hypothesis with accurate values for the minimum data-size needed to reject it. An in-depth study of the model is carried out in both simulated and experimentally-obtained data. Simulations show that under controlled data, our assumptions are true. The results of our analysis in the experimental datasets reflect the large scope of this approach in common decision-making processes.

Download Full-text

The Effects of Company Image and Communication Platform Alignment on Investor Information Processing

Journal of Financial Reporting ◽

10.2308/jfr-2017-0036 ◽

2021 ◽

Cited By ~ 1

Author(s):

Ryan D. Guggenmos ◽

G. Bradley Bennett

Keyword(s):

Decision Making ◽

New Media ◽

Bayesian Methods ◽

Null Hypothesis ◽

Judgment And Decision Making ◽

Media Technology ◽

Null Hypothesis Significance Testing ◽

Media Communication ◽

Communication Platform ◽

Expérience Subjective

Motivated by firms' increasing use of new media technology for investor communications, we investigate how alignment between company image and communication platform affects investor judgment and decision making. In our first experiment, we demonstrate that investors expect alignment between firm image and the perception of the new media communication platform managers choose for investor relations. In a second experiment, we examine how this alignment affects investor judgment and decision-making. We predict and find that greater platform-image alignment leads investors to experience subjective ease of processing, but does not change investment amounts. Additionally, we demonstrate an approach to conducting an explicit test of a null hypothesis by evaluating the convergence of null hypothesis significance testing (NHST) and Bayesian methods. Our findings have implications for researchers, firms, and investors and add to a growing literature on new media disclosure.

Download Full-text

Use of the p-values as a size-dependent function to address practical differences when analyzing large datasets

Scientific Reports ◽

10.1038/s41598-021-00199-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Estibaliz Gómez-de-Mariscal ◽

Vanesa Guerrero ◽

Alexandra Sneider ◽

Hasini Jayatilaka ◽

Jude M. Phillip ◽

...

Keyword(s):

Sample Size ◽

Null Hypothesis ◽

P Value ◽

Specific Situation ◽

Sample Sizes ◽

P Values ◽

Large Sample ◽

Size Dependent ◽

Depth Study

AbstractBiomedical research has come to rely on p-values as a deterministic measure for data-driven decision-making. In the largely extended null hypothesis significance testing for identifying statistically significant differences among groups of observations, a single p-value is computed from sample data. Then, it is routinely compared with a threshold, commonly set to 0.05, to assess the evidence against the hypothesis of having non-significant differences among groups, or the null hypothesis. Because the estimated p-value tends to decrease when the sample size is increased, applying this methodology to datasets with large sample sizes results in the rejection of the null hypothesis, making it not meaningful in this specific situation. We propose a new approach to detect differences based on the dependence of the p-value on the sample size. We introduce new descriptive parameters that overcome the effect of the size in the p-value interpretation in the framework of datasets with large sample sizes, reducing the uncertainty in the decision about the existence of biological differences between the compared experiments. The methodology enables the graphical and quantitative characterization of the differences between the compared experiments guiding the researchers in the decision process. An in-depth study of the methodology is carried out on simulated and experimental data. Code availability at https://github.com/BIIG-UC3M/pMoSS.

Download Full-text

No Evidence that Experiencing Physical Warmth Promotes Interpersonal Warmth: Two Failures to Replicate Williams and Bargh (2008)

10.31234/osf.io/mvn9b ◽

2018 ◽

Cited By ~ 1

Author(s):

Christopher Chabris ◽

Patrick Ryan Heck ◽

Jaclyn Mandart ◽

Daniel Jacob Benjamin ◽

Daniel J. Simons

Keyword(s):

Null Hypothesis ◽

Small Sample ◽

Sample Sizes ◽

Double Blind ◽

Bayesian Analyses ◽

Physical Warmth ◽

Small Sample Sizes ◽

Interpersonal Warmth

Williams and Bargh (2008) reported that holding a hot cup of coffee caused participants to judge a person’s personality as warmer, and that holding a therapeutic heat pad caused participants to choose rewards for other people rather than for themselves. These experiments featured large effects (r = .28 and .31), small sample sizes (41 and 53 participants), and barely statistically significant results. We attempted to replicate both experiments in field settings with more than triple the sample sizes (128 and 177) and double-blind procedures, but found near-zero effects (r = –.03 and .02). In both cases, Bayesian analyses suggest there is substantially more evidence for the null hypothesis of no effect than for the original physical warmth priming hypothesis.

Download Full-text

Continuous Decisions

10.31234/osf.io/y47p3 ◽

2020 ◽

Author(s):

Seng Bum Michael Yoo ◽

Benjamin Hayden ◽

John Pearson

Keyword(s):

Decision Making ◽

Control Theory ◽

Hierarchical Structure ◽

Action Planning ◽

Extended Period ◽

Microeconomic Theory ◽

Engineering Control ◽

Over Time ◽

Academic Study

Humans and other animals evolved to make decisions that extend over time with continuous and ever-changing options. Nonetheless, the academic study of decision-making is mostly limited to the simple case of choice between two options. Here we advocate that the study of choice should expand to include continuous decisions. Continuous decisions, by our definition, involve a continuum of possible responses and take place over an extended period of time during which the response is continuously subject to modification. In most continuous decisions, the range of options can fluctuate and is affected by recent responses, making consideration of reciprocal feedback between choices and the environment essential. The study of continuous decisions raises new questions, such as how abstract processes of valuation and comparison are co-implemented with action planning and execution, how we simulate the large number of possible futures our choices lead to, and how our brains employ hierarchical structure to make choices more efficiently. While microeconomic theory has proven invaluable for discrete decisions, we propose that engineering control theory may serve as a better foundation for continuous ones. And while the concept of value has proven foundational for discrete decisions, goal states and policies may prove more useful for continuous ones.

Download Full-text