Happy Hour? A Preliminary Study of the Effect of Induced Joviality and Sadness on Beer Perception

Beth Desira; Shaun Watson; George Van Doorn; Justin Timora; Charles Spence

doi:10.3390/beverages6020035

Happy Hour? A Preliminary Study of the Effect of Induced Joviality and Sadness on Beer Perception

Beverages ◽

10.3390/beverages6020035 ◽

2020 ◽

Vol 6 (2) ◽

pp. 35 ◽

Cited By ~ 2

Author(s):

Beth Desira ◽

Shaun Watson ◽

George Van Doorn ◽

Justin Timora ◽

Charles Spence

Keyword(s):

Information Theory ◽

Null Hypothesis ◽

Statistical Significance ◽

Multiple Comparisons ◽

Effect Sizes ◽

Positive And Negative Affect ◽

Significance Tests ◽

Bayesian Analyses ◽

Flavor Perception ◽

Preliminary Study

Our emotions influence our perception. In order to determine whether emotion influences the perception of beer, 32 participants watched either a scene from the movie Wall-E to induce joviality, or a short clip from the Shawshank Redemption to induce sadness. The participants were then required to sample up to 250 mL of Yenda Pale Ale beer and rate it on a variety of taste and flavor characteristics (e.g., bitterness), before completing the Positive and Negative Affect Schedule-X (PANAS-X). The data were analyzed using Bayesian t-tests and Null Hypothesis Significance Tests (NHSTs). After applying conservative corrections for multiple comparisons, NHSTs failed to reach statistical significance. However, the effect sizes suggested that inducing joviality, relative to inducing sadness, resulted in the beer being rated as (a) tasting more pleasant, (b) tasting sweeter, and (c) being of higher quality. Following the induction of joviality, participants were also willing to pay more for the beer. The Bayesian analyses indicated that induced emotion can influence flavor perception for complex taste stimuli. The effect sizes and Bayesian analyses are interpreted in terms of Feelings-as-Information theory. These preliminary findings can tentatively be applied to real-world environments such as venues that serve and/or market alcohol.

Download Full-text

The use of effect size indices to determine practical significance

Suid-Afrikaanse Tydskrif vir Natuurwetenskap en Tegnologie ◽

10.4102/satnt.v25i3.157 ◽

2006 ◽

Vol 25 (3) ◽

Author(s):

H. S. Styn ◽

S. M. Ellis

Keyword(s):

Effect Size ◽

Statistical Significance ◽

Empirical Studies ◽

Research Literature ◽

Effect Sizes ◽

Practical Significance ◽

Significance Tests ◽

Statistical Application ◽

Significant Difference

The determination of significance of differences in means and of relationships between variables is of importance in many empirical studies. Usually only statistical significance is reported, which does not necessarily indicate an important (practically significant) difference or relationship. With studies based on probability samples, effect size indices should be reported in addition to statistical significance tests in order to comment on practical significance. Where complete populations or convenience samples are worked with, the determination of statistical significance is strictly speaking no longer relevant, while the effect size indices can be used as a basis to judge significance. In this article attention is paid to the use of effect size indices in order to establish practical significance. It is also shown how these indices are utilized in a few fields of statistical application and how it receives attention in statistical literature and computer packages. The use of effect sizes is illustrated by a few examples from the research literature.

Download Full-text

The earth is flat (p>0.05): Significance thresholds and the crisis of unreplicable research

10.7287/peerj.preprints.2921v1 ◽

2017 ◽

Cited By ~ 1

Author(s):

Valentin Amrhein ◽

Fränzi Korner-Nievergelt ◽

Tobias Roth

Keyword(s):

Publication Bias ◽

Null Hypothesis ◽

Statistical Power ◽

Alternative Hypothesis ◽

Statistical Significance ◽

Practical Importance ◽

Decision Rules ◽

Effect Sizes ◽

P Values ◽

True Effect

The widespread use of 'statistical significance' as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (American Statistical Association, Wasserstein & Lazar 2016). We review why degrading p-values into 'significant' and 'nonsignificant' contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values can tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p≤0.05) is hardly replicable: at a realistic statistical power of 40%, given that there is a true effect, only one in six studies will significantly replicate the significant result of another study. Even at a good power of 80%, results from two studies will be conflicting, in terms of significance, in one third of the cases if there is a true effect. This means that a replication cannot be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgement based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to publication bias against nonsignificant findings. Data dredging, p-hacking and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that 'there is no effect'. Information on possible true effect sizes that are compatible with the data must be obtained from the observed effect size, e.g., from a sample average, and from a measure of uncertainty, such as a confidence interval. We review how confusion about interpretation of larger p-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, such as 'we need more stringent decision rules', 'sample sizes will decrease' or 'we need to get rid of p-values'.

Download Full-text

Interpreting Statistical Significance and Meaningfulness in Adapted Physical Activity Research

Adapted Physical Activity Quarterly ◽

10.1123/apaq.15.2.103 ◽

1998 ◽

Vol 15 (2) ◽

pp. 103-118 ◽

Cited By ~ 31

Author(s):

Vinson H. Sutlive ◽

Dale A. Ulrich

Keyword(s):

Physical Activity ◽

Sample Size ◽

Recent Literature ◽

Statistical Significance ◽

Effect Sizes ◽

Significance Tests ◽

Adapted Physical Activity ◽

Research Designs ◽

Alpha Level ◽

Research Findings

The unqualified use of statistical significance tests for interpreting the results of empirical research has been called into question by researchers in a number of behavioral disciplines. This paper reviews what statistical significance tells us and what it does not, with particular attention paid to criticisms of using the results of these tests as the sole basis for evaluating the overall significance of research findings. In addition, implications for adapted physical activity research are discussed. Based on the recent literature of other disciplines, several recommendations for evaluating and reporting research findings are made. They include calculating and reporting effect sizes, selecting an alpha level larger than the conventional .05 level, placing greater emphasis on replication of results, evaluating results in a sample size context, and employing simple research designs. Adapted physical activity researchers are encouraged to use specific modifiers when describing findings as significant.

Download Full-text

Costs and benefits of statistical significance tests

Behavioral and Brain Sciences ◽

10.1017/s0140525x98481160 ◽

1998 ◽

Vol 21 (2) ◽

pp. 218-219

Author(s):

Michael G. Shafto

Keyword(s):

Null Hypothesis ◽

Statistical Significance ◽

Testing Procedure ◽

Significance Testing ◽

Costs And Benefits ◽

Significance Tests ◽

Null Hypothesis Significance Testing ◽

Social Scientists ◽

Conventional Tests

Chow's book provides a thorough analysis of the confusing array of issues surrounding conventional tests of statistical significance. This book should be required reading for behavioral and social scientists. Chow concludes that the null-hypothesis significance-testing procedure (NHSTP) plays a limited, but necessary, role in the experimental sciences. Another possibility is that – owing in part to its metaphorical underpinnings and convoluted logic – the NHSTP is declining in importance in those few sciences in which it ever played a role.

Download Full-text

Effect Sizes and "What If" Analyses as Supplements to Statistical Significance Tests

Journal of Early Intervention ◽

10.1177/105381510302500406 ◽

2003 ◽

Vol 25 (4) ◽

pp. 310-319 ◽

Cited By ~ 9

Author(s):

Susan Pedersen

Keyword(s):

Statistical Significance ◽

Effect Sizes ◽

Significance Tests

Download Full-text

The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research

PeerJ ◽

10.7717/peerj.3544 ◽

2017 ◽

Vol 5 ◽

pp. e3544 ◽

Cited By ~ 87

Author(s):

Valentin Amrhein ◽

Fränzi Korner-Nievergelt ◽

Tobias Roth

Keyword(s):

Publication Bias ◽

Null Hypothesis ◽

Statistical Power ◽

Statistical Significance ◽

Practical Importance ◽

Effect Sizes ◽

Point Estimate ◽

Selective Reporting ◽

Interval Estimate ◽

True Effect

The widespread use of ‘statistical significance’ as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (according to the American Statistical Association). We review why degradingp-values into ‘significant’ and ‘nonsignificant’ contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take smallp-values at face value, but mistrust results with largerp-values. In either case,p-values tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p ≤ 0.05) is hardly replicable: at a good statistical power of 80%, two studies will be ‘conflicting’, meaning that one is significant and the other is not, in one third of the cases if there is a true effect. A replication can therefore not be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgment based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to selective reporting and to publication bias against nonsignificant findings. Data dredging,p-hacking, and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher,p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also largerp-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that ‘there is no effect’. Information on possible true effect sizes that are compatible with the data must be obtained from the point estimate, e.g., from a sample average, and from the interval estimate, such as a confidence interval. We review how confusion about interpretation of largerp-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, for example that decision rules should rather be more stringent, that sample sizes could decrease, or thatp-values should better be completely abandoned. We conclude that whatever method of statistical inference we use, dichotomous threshold thinking must give way to non-automated informed judgment.

Download Full-text

The earth is flat (p>0.05): Significance thresholds and the crisis of unreplicable research

10.7287/peerj.preprints.2921 ◽

2017 ◽

Author(s):

Valentin Amrhein ◽

Fränzi Korner-Nievergelt ◽

Tobias Roth

Keyword(s):

Publication Bias ◽

Null Hypothesis ◽

Statistical Power ◽

Statistical Significance ◽

Practical Importance ◽

Effect Sizes ◽

Selective Reporting ◽

Interval Estimate ◽

P Values ◽

True Effect

The widespread use of 'statistical significance' as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (according to the American Statistical Association). We review why degrading p-values into 'significant' and 'nonsignificant' contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p≤0.05) is hardly replicable: at a good statistical power of 80%, two studies will be 'conflicting', meaning that one is significant and the other is not, in one third of the cases if there is a true effect. A replication can therefore not be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgment based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to selective reporting and to publication bias against nonsignificant findings. Data dredging, p-hacking, and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that 'there is no effect'. Information on possible true effect sizes that are compatible with the data must be obtained from the point estimate, e.g., from a sample average, and from the interval estimate, such as a confidence interval. We review how confusion about interpretation of larger p-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, for example that decision rules should rather be more stringent, that sample sizes could decrease, or that p-values should better be completely abandoned. We conclude that whatever method of statistical inference we use, dichotomous threshold thinking must give way to non-automated informed judgment.

Download Full-text

The null hypothesis is always rejected with statistical tricks: Why do you need it?

Revista Interamericana de Psicología/Interamerican Journal of Psychology ◽

10.30849/rip/ijp.v53i1.1166 ◽

2019 ◽

Vol 53 (1) ◽

pp. 17-27

Author(s):

Freddy A. Paniagua

Keyword(s):

Null Hypothesis ◽

Statistical Significance ◽

Effect Sizes ◽

Practical Significance ◽

Significance Testing ◽

Behavioral Sciences ◽

Null Hypothesis Significance Testing ◽

Very High

Ferguson (2015) observed that the proportion of studies supporting the experimental hypothesis and rejecting the null hypothesis is very high. This paper argues that the reason for this scenario is that researchers in the behavioral sciences have learned that the null hypothesis can always be rejected if one knows the statistical tricks to reject it (e.g., the probability of rejecting the null hypothesis increases with p = 0.05 compare to p = 0.01). Examples of the advancement of science without the need to formulate the null hypothesis are also discussed, as well as alternatives to null hypothesis significance testing-NHST (e.g., effect sizes), and the importance to distinguish the statistical significance from the practical significance of results.

Download Full-text

On the Surprising Longevity of Flogged Horses: Why There Is a Case for the Significance Test

Psychological Science ◽

10.1111/j.1467-9280.1997.tb00536.x ◽

1997 ◽

Vol 8 (1) ◽

pp. 12-15 ◽

Cited By ~ 70

Author(s):

Robert P. Abelson

Keyword(s):

Null Hypothesis ◽

Goodness Of Fit ◽

Meta Analysis ◽

Significance Test ◽

Effect Sizes ◽

Significance Tests ◽

Frequent Use ◽

Alternative Procedures ◽

Mean Differences ◽

New Research

Criticisms of null-hypothesis significance tests (NHSTs) are reviewed Used as formal, two-valued decision procedures, they often generate misleading conclusions However, critics who argue that NHSTs are totally meaningless because the null hypothesis is virtually always false are overstating their case Critics also neglect the whole class of valuable significance tests that assess goodness of fit of models to data Even as applied to simple mean differences, NHSTs can be rhetorically useful in defending research against criticisms that random factors adequately explain the results, or that the direction of mean difference was not demonstrated convincingly Principled argument and counterargument produce the lore, or communal understanding, in a field, which in turn helps guide new research Alternative procedures–confidence intervals, effect sizes, and meta-analysis–are discussed Although these alternatives are not totally free from criticism either, they deserve more frequent use, without an unwise ban on NHSTs

Download Full-text

Significance tests cannot be justified in theory-corroboration experiments

Behavioral and Brain Sciences ◽

10.1017/s0140525x98421162 ◽

1998 ◽

Vol 21 (2) ◽

pp. 213-213

Author(s):

Marks R. Nester

Keyword(s):

Null Hypothesis ◽

Test Procedure ◽

Significance Test ◽

Effect Sizes ◽

Standard Errors ◽

Confidence Limits ◽

Significance Tests ◽

Null Hypothesis Significance Test

Chow's one-tailed null-hypothesis significance-test procedure, with its rationale based on the elimination of chance influences, is not appropriate for theory-corroboration experiments. Estimated effect sizes and their associated standard errors or confidence limits will always suffice.

Download Full-text