scholarly journals The Elephant in the Corner: A Cautionary Tale About Measurement Error in Treatment Effects Models

2010 ◽  
Author(s):  
Daniel L. Millimet
2022 ◽  
pp. 001316442110688
Author(s):  
Yasuo Miyazaki ◽  
Akihito Kamata ◽  
Kazuaki Uekawa ◽  
Yizhi Sun

This paper investigated consequences of measurement error in the pretest on the estimate of the treatment effect in a pretest–posttest design with the analysis of covariance (ANCOVA) model, focusing on both the direction and magnitude of its bias. Some prior studies have examined the magnitude of the bias due to measurement error and suggested ways to correct it. However, none of them clarified how the direction of bias is affected by measurement error. This study analytically derived a formula for the asymptotic bias for the treatment effect. The derived formula is a function of the reliability of the pretest, the standardized population group mean difference for the pretest, and the correlation between pretest and posttest true scores. It revealed a concerning consequence of ignoring measurement errors in pretest scores: treatment effects could be overestimated or underestimated, and positive treatment effects can be estimated as negative effects in certain conditions. A simulation study was also conducted to verify the derived bias formula.


2019 ◽  
Vol 28 (2) ◽  
pp. 263-283 ◽  
Author(s):  
Raymond Duch ◽  
Denise Laroze ◽  
Thomas Robinson ◽  
Pablo Beramendi

Experiments should be designed to facilitate the detection of experimental measurement error. To this end, we advocate the implementation of identical experimental protocols employing diverse experimental modes. We suggest iterative nonparametric estimation techniques for assessing the magnitude of heterogeneous treatment effects across these modes. And we propose two diagnostic strategies—measurement metrics embedded in experiments, and measurement experiments—that help assess whether any observed heterogeneity reflects experimental measurement error. To illustrate our argument, first we conduct and analyze results from four identical interactive experiments: in the lab; online with subjects from the CESS lab subject pool; online with an online subject pool; and online with MTurk workers. Second, we implement a measurement experiment in India with CESS Online subjects and MTurk workers.


Author(s):  
Charles M C Lee ◽  
Eric C So ◽  
Charles C Y Wang

Abstract We introduce a parsimonious framework for choosing among alternative expected-return proxies (ERPs) when estimating treatment effects. By comparing ERPs’ measurement error variances in the cross-section and in the time series, we provide new evidence on the relative performance of firm-level ERPs nominated by recent studies. Generally, “implied-costs-of-capital” metrics perform best in the time series, whereas “characteristic-based” proxies perform best in the cross-section. Factor-based ERPs, even the latest renditions, perform poorly. We revisit four prior studies that use ex ante ERPs and illustrate how this framework can potentially alter either the sign or the magnitude of prior inferences.


2020 ◽  
Vol 127 ◽  
pp. 104790
Author(s):  
Erwin Bulte ◽  
Salvatore Di Falco ◽  
Robert Lensink

Author(s):  
David Aaby ◽  
Juned Siddique

Abstract Background Lifestyle intervention studies often use self-reported measures of diet as an outcome variable to measure changes in dietary intake. The presence of measurement error in self-reported diet due to participant failure to accurately report their diet is well known. Less familiar to researchers is differential measurement error, where the nature of measurement error differs by treatment group and/or time. Differential measurement error is often present in intervention studies and can result in biased estimates of the treatment effect and reduced power to detect treatment effects. Investigators need to be aware of the impact of differential measurement error when designing intervention studies that use self-reported measures. Methods We use simulation to assess the consequences of differential measurement error on the ability to estimate treatment effects in a two-arm randomized trial with two time points. We simulate data under a variety of scenarios, focusing on how different factors affect power to detect a treatment effect, bias of the treatment effect, and coverage of the 95% confidence interval of the treatment effect. Simulations use realistic scenarios based on data from the Trials of Hypertension Prevention Study. Simulated sample sizes ranged from 110-380 per group. Results Realistic differential measurement error seen in lifestyle intervention studies can require an increased sample size to achieve 80% power to detect a treatment effect and may result in a biased estimate of the treatment effect. Conclusions Investigators designing intervention studies that use self-reported measures should take differential measurement error into account by increasing their sample size, incorporating an internal validation study, and/or identifying statistical methods to correct for differential measurement error.


2020 ◽  
Vol 29 (4) ◽  
pp. 2109-2130
Author(s):  
Lauren Bislick

Purpose This study continued Phase I investigation of a modified Phonomotor Treatment (PMT) Program on motor planning in two individuals with apraxia of speech (AOS) and aphasia and, with support from prior work, refined Phase I methodology for treatment intensity and duration, a measure of communicative participation, and the use of effect size benchmarks specific to AOS. Method A single-case experimental design with multiple baselines across behaviors and participants was used to examine acquisition, generalization, and maintenance of treatment effects 8–10 weeks posttreatment. Treatment was distributed 3 days a week, and duration of treatment was specific to each participant (criterion based). Experimental stimuli consisted of target sounds or clusters embedded nonwords and real words, specific to each participants' deficit. Results Findings show improved repetition accuracy for targets in trained nonwords, generalization to targets in untrained nonwords and real words, and maintenance of treatment effects at 10 weeks posttreatment for one participant and more variable outcomes for the other participant. Conclusions Results indicate that a modified version of PMT can promote generalization and maintenance of treatment gains for trained speech targets via a multimodal approach emphasizing repeated exposure and practice. While these results are promising, the frequent co-occurrence of AOS and aphasia warrants a treatment that addresses both motor planning and linguistic deficits. Thus, the application of traditional PMT with participant-specific modifications for AOS embedded into the treatment program may be a more effective approach. Future work will continue to examine and maximize improvements in motor planning, while also treating anomia in aphasia.


1999 ◽  
Vol 15 (2) ◽  
pp. 91-98 ◽  
Author(s):  
Lutz F. Hornke

Summary: Item parameters for several hundreds of items were estimated based on empirical data from several thousands of subjects. The logistic one-parameter (1PL) and two-parameter (2PL) model estimates were evaluated. However, model fit showed that only a subset of items complied sufficiently, so that the remaining ones were assembled in well-fitting item banks. In several simulation studies 5000 simulated responses were generated in accordance with a computerized adaptive test procedure along with person parameters. A general reliability of .80 or a standard error of measurement of .44 was used as a stopping rule to end CAT testing. We also recorded how often each item was used by all simulees. Person-parameter estimates based on CAT correlated higher than .90 with true values simulated. For all 1PL fitting item banks most simulees used more than 20 items but less than 30 items to reach the pre-set level of measurement error. However, testing based on item banks that complied to the 2PL revealed that, on average, only 10 items were sufficient to end testing at the same measurement error level. Both clearly demonstrate the precision and economy of computerized adaptive testing. Empirical evaluations from everyday uses will show whether these trends will hold up in practice. If so, CAT will become possible and reasonable with some 150 well-calibrated 2PL items.


Sign in / Sign up

Export Citation Format

Share Document