Reaction times and other skewed distributions: problems with the mean and the median

Mapping Intimacies ◽

10.1101/383935 ◽

2018 ◽

Cited By ~ 13

Author(s):

Guillaume A. Rousselet ◽

Rand R. Wilcox

Keyword(s):

Reaction Times ◽

Small Sample ◽

Careful Examination ◽

Central Tendency ◽

Skewed Distributions ◽

Experimental Conditions ◽

Sample Mean ◽

Sample Median ◽

The Mean ◽

Small Sample Sizes

ABSTRACTTo summarise skewed (asymmetric) distributions, such as reaction times, typically the mean or the median are used as measures of central tendency. Using the mean might seem surprising, given that it provides a poor measure of central tendency for skewed distributions, whereas the median provides a better indication of the location of the bulk of the observations. However, the sample median is biased: with small sample sizes, it tends to overestimate the population median. This is not the case for the mean. Based on this observation, Miller (1988) concluded that “sample medians must not be used to compare reaction times across experimental conditions when there are unequal numbers of trials in the conditions.” Here we replicate and extend Miller (1988), and demonstrate that his conclusion was ill-advised for several reasons. First, the median’s bias can be corrected using a percentile bootstrap bias correction. Second, a careful examination of the sampling distributions reveals that the sample median is median unbiased, whereas the mean is median biased when dealing with skewed distributions. That is, on average the sample mean estimates the population mean, but typically this is not the case. In addition, simulations of false and true positives in various situations show that no method dominates. Crucially, neither the mean nor the median are sufficient or even necessary to compare skewed distributions. Different questions require different methods and it would be unwise to use the mean or the median in all situations. Better tools are available to get a deeper understanding of how distributions differ: we illustrate a powerful alternative that relies on quantile estimation. All the code and data to reproduce the figures and analyses in the article are available online.

Download Full-text

Reaction times and other skewed distributions: problems with the mean and the median

10.31234/osf.io/3y54r ◽

2019 ◽

Cited By ~ 2

Author(s):

Guillaume A Rousselet ◽

Rand R. Wilcox

Keyword(s):

Reaction Times ◽

Small Sample ◽

Careful Examination ◽

Shift Function ◽

Central Tendency ◽

Skewed Distributions ◽

Experimental Conditions ◽

Sample Mean ◽

Sample Median ◽

The Mean

To summarise skewed (asymmetric) distributions, such as reaction times, typically the mean or the median are used as measures of central tendency. Using the mean might seem surprising, given that it provides a poor measure of central tendency for skewed distributions, whereas the median provides a better indication of the location of the bulk of the observations. However, the sample median is biased: with small sample sizes, it tends to overestimate the population median. This is not the case for the mean. Based on this observation, Miller (1988) concluded that ”sample medians must not be used to compare reaction times across experimental conditions when there are unequal numbers of trials in the conditions.” Here we replicate and extend Miller (1988), and demonstrate that his conclusion was ill-advised for several reasons. First, the median’s bias can be corrected using a percentile bootstrap bias correction. Second, a careful examination of the sampling distributions reveals that the sample median is median unbiased, whereas the mean is median biased when dealing with skewed distributions. That is, on average the sample mean estimates the population mean, but typically this is not the case. In addition, simulations of false and true positives in various situations show that no method dominates. Crucially, neither the mean nor the median are sufficient or even necessary to compare skewed distributions. Different questions require different methods and it would be unwise to use the mean or the median in all situations. Better tools are available to get a deeper understanding of how distributions differ: we illustrate the hierarchical shift function, a powerful alternative that relies on quantile estimation. All the code and data to reproduce the figures and analyses in the article are available online.

Download Full-text

Reaction Times and other Skewed Distributions

Meta-Psychology ◽

10.15626/mp.2019.1630 ◽

2020 ◽

Vol 4 ◽

Cited By ~ 1

Author(s):

Guillaume A Rousselet ◽

Rand R Wilcox

Keyword(s):

Reaction Times ◽

Small Sample ◽

Careful Examination ◽

Shift Function ◽

Central Tendency ◽

Skewed Distributions ◽

Experimental Conditions ◽

Sample Mean ◽

Sample Median ◽

The Mean

To summarise skewed (asymmetric) distributions, such as reaction times, typically the mean or the median are used as measures of central tendency. Using the mean might seem surprising, given that it provides a poor measure of central tendency for skewed distributions, whereas the median provides a better indication of the location of the bulk of the observations. However, the sample median is biased: with small sample sizes, it tends to overestimate the population median. This is not the case for the mean. Based on this observation, Miller (1988) concluded that "sample medians must not be used to compare reaction times across experimental conditions when there are unequal numbers of trials in the conditions". Here we replicate and extend Miller (1988), and demonstrate that his conclusion was ill-advised for several reasons. First, the median's bias can be corrected using a percentile bootstrap bias correction. Second, a careful examination of the sampling distributions reveals that the sample median is median unbiased, whereas the mean is median biased when dealing with skewed distributions. That is, on average the sample mean estimates the population mean, but typically this is not the case. In addition, simulations of false and true positives in various situations show that no method dominates. Crucially, neither the mean nor the median are sufficient or even necessary to compare skewed distributions. Different questions require different methods and it would be unwise to use the mean or the median in all situations. Better tools are available to get a deeper understanding of how distributions differ: we illustrate the hierarchical shift function, a powerful alternative that relies on quantile estimation. All the code and data to reproduce the figures and analyses in the article are available online.

Download Full-text

Another Warning about Median Reaction Time --- Version of 11 Feb 2020

10.31234/osf.io/3q5np ◽

2020 ◽

Author(s):

Jeff Miller

Keyword(s):

Type I Error ◽

Reaction Times ◽

Error Rates ◽

Type I ◽

Experimental Conditions ◽

Type I Error Rates ◽

Correction Technique ◽

Inflated Type ◽

The Mean ◽

Rt Distributions

Contrary to the warning of Miller (1988), Rousselet and Wilcox (2020) argued that it is better to summarize each participant’s single-trial reaction times (RTs) in a given condition with the median than with the mean when comparing the central tendencies of RT distributions across experimental conditions. They acknowledged that median RTs can produce inflated Type I error rates when conditions differ in the number of trials tested, consistent with Miller’s warning, but they showed that the bias responsible for this error rate inflation could be eliminated with a bootstrap bias correction technique. The present simulations extend their analysis by examining the power of bias-corrected medians to detect true experimental effects and by comparing this power with the power of analyses using means and regular medians. Unfortunately, although bias-corrected medians solve the problem of inflated Type I error rates, their power is lower than that of means or regular medians in many realistic situations. In addition, even when conditions do not differ in the number of trials tested, the power of tests (e.g., t-tests) is generally lower using medians rather than means as the summary measures. Thus, the present simulations demonstrate that summary means will often provide the most powerful test for differences between conditions, and they show what aspects of the RT distributions determine the size of the power advantage for means.

Download Full-text

Land snails as experimental animals: a study of the variability and distribution of individual weights in Helix aspersa snails born from the same clutch

Laboratory Animals ◽

10.1258/002367790780890275 ◽

1990 ◽

Vol 24 (1) ◽

pp. 1-4 ◽

Cited By ~ 5

Author(s):

R. Sanz Sampelayo ◽

J. Fonolla ◽

F. Gil Extremera

Keyword(s):

Normal Distribution ◽

Growth Curve ◽

Original Data ◽

Helix Aspersa ◽

Land Snails ◽

Central Tendency ◽

Experimental Conditions ◽

Experimental Animals ◽

Coefficients Of Variation ◽

The Mean

A study was carried out to examine the distribution of individual weights in Helix aspersa snails, the aims being to establish the best estimate of the ponderal growth and also to obtain a model growth curve. Four groups of 20 snails from the same clutch were analysed and kept under experimental conditions from birth up to 6 months. The variability of their individual weights within groups was studied by calculating the coefficients of variation every 15 days. At the same time, the assumed normal distribution of those weights was being tested. The coefficients of variation increased with age and the assumed normal distribution of individual weights had to be rejected. By means of a log transformation of the original data, a model growth curve was constructed, and was used to assess the possibility of estimating age from weight. We finally reached the conclusion that median weight, rather than the mean, would be a better measure of central tendency to use until it is possible to obtain selected populations. The difficulty of estimating age from weight is emphasized.

Download Full-text

JMASM 52: Extremely Efficient Permutation and Bootstrap Hypothesis Tests Using R

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1604189940 ◽

2020 ◽

Vol 18 (2) ◽

pp. 2-16

Author(s):

Christina Chatzipantsiou ◽

Marios Dimitriadis ◽

Manos Papadakis ◽

Michail Tsagris

Keyword(s):

Efficient Method ◽

Statistical Tests ◽

Pearson Correlation ◽

Small Sample ◽

Pearson Correlation Coefficient ◽

Computationally Efficient ◽

Sample Mean ◽

P Values ◽

Mean Vectors ◽

Small Sample Sizes

Re-sampling based statistical tests are known to be computationally heavy, but reliable when small sample sizes are available. Despite their nice theoretical properties not much effort has been put to make them efficient. Computationally efficient method for calculating permutation-based p-values for the Pearson correlation coefficient and two independent samples t-test are proposed. The method is general and can be applied to other similar two sample mean or two mean vectors cases.

Download Full-text

Evaluating ANSI Z535-Formatted Warning Labels as an Integrative System

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1541931213601378 ◽

2016 ◽

Vol 60 (1) ◽

pp. 1642-1646 ◽

Cited By ~ 2

Author(s):

Michael J. Kalsher ◽

William G. Obenauer ◽

Christopher F. Weiss

Keyword(s):

Relative Effectiveness ◽

Small Sample ◽

Warning Label ◽

Experimental Conditions ◽

Positive Effects ◽

Integrative System ◽

Significant Relationships ◽

Compliance Behavior ◽

Research Designs ◽

Small Sample Sizes

Debate continues regarding the relative effectiveness of the ANSI Z535 guidelines for the design and placement of warnings. Research shows consistent positive effects of these guidelines on precursors to warning compliance (e.g., noticing, reading, intended compliance), but less consistency on compliance behavior. Challenges in interpreting these findings stem from factors such as small sample sizes, varying research designs and experimental conditions, and treating the ANSI Z535 guidelines as a singular entity rather than as an integrative system of separable features. Here, we address these issues by testing perceptions of warning label effectiveness using a large sample ( n=533) and systematically manipulating variables cited in the Z535 guidelines. Collectively, we tested eight label designs for a (2-drawer and 4-drawer) file cabinet and found statistically significant relationships between design recommendations from the ANSI Z535 guidelines and perceptions of effectiveness. The presence of a warning header and pictogram exerted the largest effects. Bulleted text and (larger) font size were also related to increased perceptions of effectiveness.

Download Full-text

Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis

Statistical Methods in Medical Research ◽

10.1177/0962280219889080 ◽

2020 ◽

Vol 29 (9) ◽

pp. 2520-2537 ◽

Cited By ~ 7

Author(s):

Sean McGrath ◽

XiaoFei Zhao ◽

Russell Steele ◽

Brett D. Thombs ◽

Andrea Benedetti ◽

...

Keyword(s):

Standard Deviation ◽

Meta Analysis ◽

Outcome Variable ◽

Sample Mean ◽

Sample Median ◽

Common Effect ◽

The Mean ◽

Novel Approaches ◽

Summary Data ◽

Better Than

Researchers increasingly use meta-analysis to synthesize the results of several studies in order to estimate a common effect. When the outcome variable is continuous, standard meta-analytic approaches assume that the primary studies report the sample mean and standard deviation of the outcome. However, when the outcome is skewed, authors sometimes summarize the data by reporting the sample median and one or both of (i) the minimum and maximum values and (ii) the first and third quartiles, but do not report the mean or standard deviation. To include these studies in meta-analysis, several methods have been developed to estimate the sample mean and standard deviation from the reported summary data. A major limitation of these widely used methods is that they assume that the outcome distribution is normal, which is unlikely to be tenable for studies reporting medians. We propose two novel approaches to estimate the sample mean and standard deviation when data are suspected to be non-normal. Our simulation results and empirical assessments show that the proposed methods often perform better than the existing methods when applied to non-normal data.

Download Full-text

Predation by house cats, Felis catus (L.), in Canberra, Australia. II. Factors affecting the amount of prey caught and estimates of the impact on wildlife

Wildlife Research ◽

10.1071/wr97026 ◽

1998 ◽

Vol 25 (5) ◽

pp. 475 ◽

Cited By ~ 67

Author(s):

D. G. Barratt

Keyword(s):

Significant Influence ◽

Felis Catus ◽

Potential Prey ◽

Sample Mean ◽

Sample Median ◽

Factors Affecting ◽

Source Areas ◽

Residential Developments ◽

The Mean ◽

The Impact

Information on the amount of vertebrate prey caught by house cats in Canberra was collected by recording prey deposited at cat owners’ residences over 12 months. The amount of prey taken was not significantly influenced by cat gender, age when neutered, or cat breed. Nor did belling or the number of meals provided per day have a significant influence on the amount of prey caught. The age of the cat and the proportion of nights spent outside explained approximately 11% of the variation in the amount of prey caught by individual cats. In all, 43% of variation in predation on introduced species (predominantly rodents) was explained by distance from potential prey source areas (i.e. rural/grassland habitat) and cat density. The mean number of prey reported per cat over 12 months (10.2) was significantly lower than mean predation per cat per year based on estimates made by cat owners before the prey survey began (23.3). Counts of the amount of prey caught by house cats were highly positively skewed. In all, 70% of cats were observed to catch less than 10 prey over 12 months, but for 6% of cats, more than 50 prey were recorded. Estimates of predation by house cats, particularly extrapolated estimates, should be treated with caution. The total number of prey caught by house cats in Canberra estimated using the sample median was approximately half the estimate based on the sample mean. Predation estimates alone do not prove that prey populations are detrimentally affected, especially in highly disturbed and modified environments such as suburbs. Impacts on native fauna are likely to be most significant in undisturbed habitat adjacent to new residential developments.

Download Full-text

Application of Confidence Regions to Ice Ridge Keel Data Statistical Assessment

Volume 8: Polar and Arctic Sciences and Technology; Petroleum Technology ◽

10.1115/omae2017-62253 ◽

2017 ◽

Author(s):

Petr Zvyagin ◽

Jaakko Heinonen

Keyword(s):

Confidence Region ◽

Random Variable ◽

Confidence Regions ◽

Small Sample ◽

Unknown Parameters ◽

Minimum Area ◽

Statistical Assessment ◽

Gulf Of Bothnia ◽

The Mean ◽

Small Sample Sizes

Sets of measurements of underwater ridge parts usually contain a limited amount of data. Outcomes need to be made while relying on small sample sizes. In this event, the chance of making inaccurate estimations increases. This paper proposes to use stochastic confidence regions in the estimation of the unknown parameters of keel depths. A model for a random variable with a lognormal distribution for keel depths is assumed. Regions for the mean and standard deviation of keel depths are obtained from Mood’s and minimum-area confidence regions for parameters of the normally distributed random variable. Conservative safety probability of non-exceeding the critical keel depth in one random interaction of the ridge with structure is estimated. An algorithm for statistically assessment of ice ridge keel data by means of confidence region building is here offered. The assessment of a set of ridge keel depths for the Gulf of Bothnia (Baltic Sea) is performed.

Download Full-text

Evaluation of TagSeq, a reliable low-cost alternative for RNAseq

10.1101/036426 ◽

2016 ◽

Cited By ~ 5

Author(s):

Brian Keith Lohman ◽

Jesse N Weber ◽

Daniel I Bolnick

Keyword(s):

Gene Expression ◽

Statistical Power ◽

Low Cost ◽

Ecological Genetics ◽

Small Sample ◽

Sample Sizes ◽

Experimental Conditions ◽

Efficient Alternative ◽

Highly Correlated ◽

Small Sample Sizes

RNAseq is a relatively new tool for ecological genetics that offers researchers insight into changes in gene expression in response to a myriad of natural or experimental conditions. However, standard RNAseq methods (e.g., Illumina TruSeq® or NEBNext®) can be cost prohibitive, especially when study designs require large sample sizes. Consequently, RNAseq is often underused as a method, or is applied to small sample sizes that confer poor statistical power. Low cost RNAseq methods could therefore enable far greater and more powerful applications of transcriptomics in ecological genetics and beyond. Standard mRNAseq is costly partly because one sequences portions of the full length of all transcripts. Such whole-mRNA data is redundant for estimates of relative gene expression. TagSeq is an alternative method that focuses sequencing effort on mRNAs 3-prime end, thereby reducing the necessary sequencing depth per sample, and thus cost. Here we present a revised TagSeq protocol, and compare its performance against NEBNext®, the gold-standard whole mRNAseq method. We built both TagSeq and NEBNext® libraries from the same biological samples, each spiked with control RNAs. We found that TagSeq measured the control RNA distribution more accurately than NEBNext®, for a fraction of the cost per sample (~10%). The higher accuracy of TagSeq was particularly apparent for transcripts of moderate to low abundance. Technical replicates of TagSeq libraries are highly correlated, and were correlated with NEBNext® results. Overall, we show that our modified TagSeq protocol is an efficient alternative to traditional whole mRNAseq, offering researchers comparable data at greatly reduced cost.

Download Full-text