rater drift Latest Research Papers

On the Performance of the Marginal Homogeneity Test to Detect Rater Drift

Applied Psychological Measurement ◽

10.1177/0146621617730390 ◽

2017 ◽

Vol 42 (4) ◽

pp. 307-320 ◽

Cited By ~ 1

Author(s):

Adrienne Sgammato ◽

John R. Donoghue

Keyword(s):

Error Control ◽

Type I Error ◽

Study Data ◽

T Test ◽

Homogeneity Test ◽

Type I ◽

Marginal Homogeneity ◽

Size Number ◽

Almost All ◽

Rater Drift

When constructed response items are administered repeatedly, “trend scoring” can be used to test for rater drift. In trend scoring, raters rescore responses from the previous administration. Two simulation studies evaluated the utility of Stuart’s Q measure of marginal homogeneity as a way of evaluating rater drift when monitoring trend scoring. In the first study, data were generated based on trend scoring tables obtained from an operational assessment. The second study tightly controlled table margins to disentangle certain features present in the empirical data. In addition to Q, the paired t test was included as a comparison, because of its widespread use in monitoring trend scoring. Sample size, number of score categories, interrater agreement, and symmetry/asymmetry of the margins were manipulated. For identical margins, both statistics had good Type I error control. For a unidirectional shift in margins, both statistics had good power. As expected, when shifts in the margins were balanced across categories, the t test had little power. Q demonstrated good power for all conditions and identified almost all items identified by the t test. Q shows substantial promise for monitoring of trend scoring.

Download Full-text

P.2.a.007 Quantifying rater drift on the HAM-D: implications for reliability, sample size, and ongoing training strategy

European Neuropsychopharmacology ◽

10.1016/s0924-977x(11)70571-5 ◽

2011 ◽

Vol 21 ◽

pp. S357-S358 ◽

Cited By ~ 1

Author(s):

A. DeFries ◽

B. Rothman ◽

C. Yavorsky ◽

M. Opler ◽

J. Gordon ◽

...

Keyword(s):

Sample Size ◽

Training Strategy ◽

Ongoing Training ◽

Rater Drift

Download Full-text

Quantifying rater drift on the HAM-D in a sample of standardized rater training events: Implications for reliability and sample size calculations

European Psychiatry ◽

10.1016/s0924-9338(11)72389-6 ◽

2011 ◽

Vol 26 (S2) ◽

pp. 683-683 ◽

Cited By ~ 2

Author(s):

B. Rothman ◽

C. Yavorsky ◽

A. De Fries ◽

J. Gordon ◽

M. Opler

Keyword(s):

Sample Size ◽

Training Session ◽

Correlation Coefficients ◽

Individual Performance ◽

Power Calculation ◽

Hamilton Depression Scale ◽

Intra Class Correlation ◽

Expert Ratings ◽

The Impact ◽

Rater Drift

Introduction/objectives/aimsThough rater drift in clinical trials has long been understood to negatively impact trial results, few studies have systematically quantified this. We examined training data for the HAM-D (Hamilton Depression Scale, 17-item version) at two time points to measure the impact.MethodsRaters participating in a standardized training scored the HAM-D based on two videotaped interviews of depressed patients. To assess drift, data from an initial, post-online training session was compared to data obtained 12 months later. Intra-class correlation coefficients (Shrout & Fliess, 1979) and concordance with expert ratings were compared.ResultsIntra-class correlation coefficients (ICC) for raters (n = 167) following initial training were good to excellent for individual raters (.695–.976, p < .0001) and good for the overall cohort (.752, p < .0001). Concordance with expert ratings was excellent at 99.3%. The overall ICC fell to .730 at the second assessment and although the upper bound of individual performance remained in the good to excellent range, the frequency of scores in the poor to fair range (< .65) increased. Concordance also fell slightly to 87%.ConclusionsRater drift occurred over 12 months, as gauged by the metrics of reliability and concordance. Drift was apparent in a limited portion of the cohort but resulted in a lower overall ICC at the second time point. Because studies are generally powered assuming that the ICC remains stable, there are implications for both this power calculation and the required sample size.

Download Full-text

P.3.f.002 Quantifying rater drift in a sample of standardised rater training events: Is PANSS reliability maintained over time?

European Neuropsychopharmacology ◽

10.1016/s0924-977x(10)70767-7 ◽

2010 ◽

Vol 20 ◽

pp. S513-S514

Author(s):

A. DeFries ◽

C. Yavorsky ◽

M. Opler ◽

L. Ramadhar ◽

E. Ivanova ◽

...

Keyword(s):

Rater Training ◽

Over Time ◽

Rater Drift

Download Full-text

An Examination of Rater Drift Within a Generalizability Theory Framework

Journal of Educational Measurement ◽

10.1111/j.1745-3984.2009.01068.x ◽

2009 ◽

Vol 46 (1) ◽

pp. 43-58 ◽

Cited By ~ 28

Author(s):

Polina Harik ◽

Brian E. Clauser ◽

Irina Grabovsky ◽

Ronald J. Nungester ◽

Dave Swanson ◽

...

Keyword(s):

Generalizability Theory ◽

Theory Framework ◽

Rater Drift

Download Full-text

C.21.01 Defining rater drift and rater variability

European Neuropsychopharmacology ◽

10.1016/s0924-977x(08)70965-9 ◽

2008 ◽

Vol 18 ◽

pp. S611

Author(s):

A. Szegedi

Keyword(s):

Rater Variability ◽

Rater Drift

Download Full-text

Real-Time Feedback on Rater Drift in Constructed-Response Items: An Example From the Golden State Examination

Journal of Educational Measurement ◽

10.1111/j.1745-3984.2001.tb01119.x ◽

2001 ◽

Vol 38 (2) ◽

pp. 121-145 ◽

Cited By ~ 21

Author(s):

Machteld Hoskens ◽

Mark Wilson

Keyword(s):

Real Time ◽

Constructed Response ◽

State Examination ◽

Rater Drift

Download Full-text

rater drift
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

On the Performance of the Marginal Homogeneity Test to Detect Rater Drift

P.2.a.007 Quantifying rater drift on the HAM-D: implications for reliability, sample size, and ongoing training strategy

Quantifying rater drift on the HAM-D in a sample of standardized rater training events: Implications for reliability and sample size calculations

P.3.f.002 Quantifying rater drift in a sample of standardised rater training events: Is PANSS reliability maintained over time?

An Examination of Rater Drift Within a Generalizability Theory Framework

C.21.01 Defining rater drift and rater variability

Real-Time Feedback on Rater Drift in Constructed-Response Items: An Example From the Golden State Examination

Export Citation Format

rater driftRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

On the Performance of the Marginal Homogeneity Test to Detect Rater Drift

P.2.a.007 Quantifying rater drift on the HAM-D: implications for reliability, sample size, and ongoing training strategy

Quantifying rater drift on the HAM-D in a sample of standardized rater training events: Implications for reliability and sample size calculations

P.3.f.002 Quantifying rater drift in a sample of standardised rater training events: Is PANSS reliability maintained over time?

An Examination of Rater Drift Within a Generalizability Theory Framework

C.21.01 Defining rater drift and rater variability

Real-Time Feedback on Rater Drift in Constructed-Response Items: An Example From the Golden State Examination

rater drift
Recently Published Documents