The Elephant in the Corner: A Cautionary Tale About Measurement Error in Treatment Effects Models

This paper investigated consequences of measurement error in the pretest on the estimate of the treatment effect in a pretest–posttest design with the analysis of covariance (ANCOVA) model, focusing on both the direction and magnitude of its bias. Some prior studies have examined the magnitude of the bias due to measurement error and suggested ways to correct it. However, none of them clarified how the direction of bias is affected by measurement error. This study analytically derived a formula for the asymptotic bias for the treatment effect. The derived formula is a function of the reliability of the pretest, the standardized population group mean difference for the pretest, and the correlation between pretest and posttest true scores. It revealed a concerning consequence of ignoring measurement errors in pretest scores: treatment effects could be overestimated or underestimated, and positive treatment effects can be estimated as negative effects in certain conditions. A simulation study was also conducted to verify the derived bias formula.

Download Full-text

Multi-modes for Detecting Experimental Measurement Error

Political Analysis ◽

10.1017/pan.2019.34 ◽

2019 ◽

Vol 28 (2) ◽

pp. 263-283 ◽

Cited By ~ 1

Author(s):

Raymond Duch ◽

Denise Laroze ◽

Thomas Robinson ◽

Pablo Beramendi

Keyword(s):

Measurement Error ◽

Nonparametric Estimation ◽

Experimental Measurement ◽

Treatment Effects ◽

Heterogeneous Treatment Effects ◽

Estimation Techniques ◽

Diagnostic Strategies ◽

Subject Pool ◽

Experimental Protocols ◽

Measurement Experiment

Experiments should be designed to facilitate the detection of experimental measurement error. To this end, we advocate the implementation of identical experimental protocols employing diverse experimental modes. We suggest iterative nonparametric estimation techniques for assessing the magnitude of heterogeneous treatment effects across these modes. And we propose two diagnostic strategies—measurement metrics embedded in experiments, and measurement experiments—that help assess whether any observed heterogeneity reflects experimental measurement error. To illustrate our argument, first we conduct and analyze results from four identical interactive experiments: in the lab; online with subjects from the CESS lab subject pool; online with an online subject pool; and online with MTurk workers. Second, we implement a measurement experiment in India with CESS Online subjects and MTurk workers.

Download Full-text

Correcting for Test Score Measurement Error in ANCOVA Models for Estimating Treatment Effects

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998613509405 ◽

2014 ◽

Vol 39 (1) ◽

pp. 22-52 ◽

Cited By ~ 36

Author(s):

J. R. Lockwood ◽

Daniel F. McCaffrey

Keyword(s):

Measurement Error ◽

Test Score ◽

Treatment Effects

Download Full-text

Evaluating Firm-Level Expected-Return Proxies: Implications for Estimating Treatment Effects

Review of Financial Studies ◽

10.1093/rfs/hhaa066 ◽

2020 ◽

Cited By ~ 1

Author(s):

Charles M C Lee ◽

Eric C So ◽

Charles C Y Wang

Keyword(s):

Time Series ◽

Measurement Error ◽

Cross Section ◽

Treatment Effects ◽

Relative Performance ◽

Expected Return ◽

Firm Level ◽

Ex Ante ◽

The Cross ◽

New Evidence

Abstract We introduce a parsimonious framework for choosing among alternative expected-return proxies (ERPs) when estimating treatment effects. By comparing ERPs’ measurement error variances in the cross-section and in the time series, we provide new evidence on the relative performance of firm-level ERPs nominated by recent studies. Generally, “implied-costs-of-capital” metrics perform best in the time series, whereas “characteristic-based” proxies perform best in the cross-section. Factor-based ERPs, even the latest renditions, perform poorly. We revisit four prior studies that use ex ante ERPs and illustrate how this framework can potentially alter either the sign or the magnitude of prior inferences.

Download Full-text

Occupational wage Inequality and Devaluation: A Cautionary Tale of Measurement Error

American Journal of Sociology ◽

10.1086/210472 ◽

2000 ◽

Vol 105 (6) ◽

pp. 1752-1760 ◽

Cited By ~ 18

Author(s):

Tony Tam

Keyword(s):

Measurement Error ◽

Wage Inequality ◽

Cautionary Tale

Download Full-text

Randomized interventions and “real” treatment effects: A cautionary tale and an example

World Development ◽

10.1016/j.worlddev.2019.104790 ◽

2020 ◽

Vol 127 ◽

pp. 104790

Author(s):

Erwin Bulte ◽

Salvatore Di Falco ◽

Robert Lensink

Keyword(s):

Treatment Effects ◽

Cautionary Tale

Download Full-text

Effects of differential measurement error in self-reported diet in longitudinal lifestyle intervention studies

International Journal of Behavioral Nutrition and Physical Activity ◽

10.1186/s12966-021-01184-x ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

David Aaby ◽

Juned Siddique

Keyword(s):

Measurement Error ◽

Sample Size ◽

Lifestyle Intervention ◽

Treatment Effect ◽

Treatment Effects ◽

Intervention Studies ◽

Outcome Variable ◽

Differential Measurement ◽

Differential Measurement Error ◽

The Impact

Abstract Background Lifestyle intervention studies often use self-reported measures of diet as an outcome variable to measure changes in dietary intake. The presence of measurement error in self-reported diet due to participant failure to accurately report their diet is well known. Less familiar to researchers is differential measurement error, where the nature of measurement error differs by treatment group and/or time. Differential measurement error is often present in intervention studies and can result in biased estimates of the treatment effect and reduced power to detect treatment effects. Investigators need to be aware of the impact of differential measurement error when designing intervention studies that use self-reported measures. Methods We use simulation to assess the consequences of differential measurement error on the ability to estimate treatment effects in a two-arm randomized trial with two time points. We simulate data under a variety of scenarios, focusing on how different factors affect power to detect a treatment effect, bias of the treatment effect, and coverage of the 95% confidence interval of the treatment effect. Simulations use realistic scenarios based on data from the Trials of Hypertension Prevention Study. Simulated sample sizes ranged from 110-380 per group. Results Realistic differential measurement error seen in lifestyle intervention studies can require an increased sample size to achieve 80% power to detect a treatment effect and may result in a biased estimate of the treatment effect. Conclusions Investigators designing intervention studies that use self-reported measures should take differential measurement error into account by increasing their sample size, incorporating an internal validation study, and/or identifying statistical methods to correct for differential measurement error.

Download Full-text

A Phonomotor Approach to Apraxia of Speech Treatment

American Journal of Speech-Language Pathology ◽

10.1044/2020_ajslp-19-00116 ◽

2020 ◽

Vol 29 (4) ◽

pp. 2109-2130

Author(s):

Lauren Bislick

Keyword(s):

Phase I ◽

Treatment Effects ◽

Single Case ◽

Motor Planning ◽

The Other ◽

Multimodal Approach ◽

Apraxia Of Speech ◽

Duration Of Treatment ◽

Treatment Gains ◽

Future Work

Purpose This study continued Phase I investigation of a modified Phonomotor Treatment (PMT) Program on motor planning in two individuals with apraxia of speech (AOS) and aphasia and, with support from prior work, refined Phase I methodology for treatment intensity and duration, a measure of communicative participation, and the use of effect size benchmarks specific to AOS. Method A single-case experimental design with multiple baselines across behaviors and participants was used to examine acquisition, generalization, and maintenance of treatment effects 8–10 weeks posttreatment. Treatment was distributed 3 days a week, and duration of treatment was specific to each participant (criterion based). Experimental stimuli consisted of target sounds or clusters embedded nonwords and real words, specific to each participants' deficit. Results Findings show improved repetition accuracy for targets in trained nonwords, generalization to targets in untrained nonwords and real words, and maintenance of treatment effects at 10 weeks posttreatment for one participant and more variable outcomes for the other participant. Conclusions Results indicate that a modified version of PMT can promote generalization and maintenance of treatment gains for trained speech targets via a multimodal approach emphasizing repeated exposure and practice. While these results are promising, the frequent co-occurrence of AOS and aphasia warrants a treatment that addresses both motor planning and linguistic deficits. Thus, the application of traditional PMT with participant-specific modifications for AOS embedded into the treatment program may be a more effective approach. Future work will continue to examine and maximize improvements in motor planning, while also treating anomia in aphasia.

Download Full-text

Benefits from Computerized Adaptive Testing as Seen in Simulation Studies

European Journal of Psychological Assessment ◽

10.1027//1015-5759.15.2.91 ◽

1999 ◽

Vol 15 (2) ◽

pp. 91-98 ◽

Cited By ~ 10

Author(s):

Lutz F. Hornke

Keyword(s):

Measurement Error ◽

Computerized Adaptive Testing ◽

Test Procedure ◽

Adaptive Testing ◽

Parameter Estimates ◽

Simulation Studies ◽

Computerized Adaptive Test ◽

Item Banks ◽

Item Parameters ◽

General Reliability

Summary: Item parameters for several hundreds of items were estimated based on empirical data from several thousands of subjects. The logistic one-parameter (1PL) and two-parameter (2PL) model estimates were evaluated. However, model fit showed that only a subset of items complied sufficiently, so that the remaining ones were assembled in well-fitting item banks. In several simulation studies 5000 simulated responses were generated in accordance with a computerized adaptive test procedure along with person parameters. A general reliability of .80 or a standard error of measurement of .44 was used as a stopping rule to end CAT testing. We also recorded how often each item was used by all simulees. Person-parameter estimates based on CAT correlated higher than .90 with true values simulated. For all 1PL fitting item banks most simulees used more than 20 items but less than 30 items to reach the pre-set level of measurement error. However, testing based on item banks that complied to the 2PL revealed that, on average, only 10 items were sufficient to end testing at the same measurement error level. Both clearly demonstrate the precision and economy of computerized adaptive testing. Empirical evaluations from everyday uses will show whether these trends will hold up in practice. If so, CAT will become possible and reasonable with some 150 well-calibrated 2PL items.

Download Full-text