A Comparison of Linking Methods for Two Groups for the Two-Parameter Logistic Item Response Model in the Presence and Absence of Random Differential Item Functioning

Alexander Robitzsch

doi:10.3390/foundations1010009

A Comparison of Linking Methods for Two Groups for the Two-Parameter Logistic Item Response Model in the Presence and Absence of Random Differential Item Functioning

Foundations ◽

10.3390/foundations1010009 ◽

2021 ◽

Vol 1 (1) ◽

pp. 116-144

Author(s):

Alexander Robitzsch

Keyword(s):

Differential Item Functioning ◽

Item Response ◽

Large Scale ◽

Response Model ◽

Item Response Model ◽

Large Scale Assessment ◽

Item Functioning ◽

Item Parameters ◽

Concurrent Calibration ◽

Two Parameter

This article investigates the comparison of two groups based on the two-parameter logistic item response model. It is assumed that there is random differential item functioning in item difficulties and item discriminations. The group difference is estimated using separate calibration with subsequent linking, as well as concurrent calibration. The following linking methods are compared: mean-mean linking, log-mean-mean linking, invariance alignment, Haberman linking, asymmetric and symmetric Haebara linking, different recalibration linking methods, anchored item parameters, and concurrent calibration. It is analytically shown that log-mean-mean linking and mean-mean linking provide consistent estimates if random DIF effects have zero means. The performance of the linking methods was evaluated through a simulation study. It turned out that (log-)mean-mean and Haberman linking performed best, followed by symmetric Haebara linking and a newly proposed recalibration linking method. Interestingly, linking methods frequently found in applications (i.e., asymmetric Haebara linking, recalibration linking used in a variant in current large-scale assessment studies, anchored item parameters, concurrent calibration) perform worse in the presence of random differential item functioning. In line with the previous literature, differences between linking methods turned out be negligible in the absence of random differential item functioning. The different linking methods were also applied in an empirical example that performed a linking of PISA 2006 to PISA 2009 for Austrian students. This application showed that estimated trends in the means and standard deviations depended on the chosen linking method and the employed item response model.

Download Full-text

A Conceptual Analysis of Differential Item Functioning in Terms of a Multidimensional Item Response Model

Applied Psychological Measurement ◽

10.1177/014662169201600203 ◽

1992 ◽

Vol 16 (2) ◽

pp. 129-147 ◽

Cited By ~ 43

Author(s):

Gregory Camilli

Keyword(s):

Differential Item Functioning ◽

Item Response ◽

Conceptual Analysis ◽

Response Model ◽

Item Response Model ◽

Multidimensional Item Response ◽

Item Functioning

Download Full-text

A Simulation Study of Item Bias Using a Two-Parameter Item Response Model

Applied Psychological Measurement ◽

10.1177/014662168500900408 ◽

1985 ◽

Vol 9 (4) ◽

pp. 389-400 ◽

Cited By ~ 14

Author(s):

Cynthia D. McCauley ◽

Jorge Mendoza

Keyword(s):

Item Response ◽

Simulation Study ◽

Item Bias ◽

Response Model ◽

Item Response Model ◽

Two Parameter

Download Full-text

A Note on the Presence of Spurious Pseudo-Guessing Parameters for Three-Parameter Logistic Models in Heterogeneous Populations

Educational and Psychological Measurement ◽

10.1177/0013164419850882 ◽

2019 ◽

Vol 80 (3) ◽

pp. 604-612

Author(s):

Tenko Raykov ◽

George A. Marcoulides

Keyword(s):

Educational Research ◽

Item Response ◽

Logistic Model ◽

Unobserved Heterogeneity ◽

Logistic Models ◽

Parameter Estimate ◽

Response Model ◽

Measuring Instrument ◽

Item Response Model ◽

Two Parameter

This note raises caution that a finding of a marked pseudo-guessing parameter for an item within a three-parameter item response model could be spurious in a population with substantial unobserved heterogeneity. A numerical example is presented wherein each of two classes the two-parameter logistic model is used to generate the data on a multi-item measuring instrument, while the three-parameter logistic model is found to be associated with a considerable pseudo-guessing parameter estimate on an item. The implications of the reported results for empirical educational research are subsequently discussed.

Download Full-text

A Generalized Formula for the Mantel-Haenszel Differential Item Functioning Parameter

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986024003293 ◽

1999 ◽

Vol 24 (3) ◽

pp. 293-322 ◽

Cited By ~ 27

Author(s):

Louis A. Roussos ◽

Deborah L. Schnipke ◽

Peter J. Pashley

Keyword(s):

Differential Item Functioning ◽

Item Response ◽

Item Difficulty ◽

Difficulty Level ◽

Population Parameter ◽

Simulation Studies ◽

Item Functioning ◽

The Difference ◽

Item Response Function ◽

Two Parameter

The present study derives a general formula for the population parameter being estimated by the Mantel-Haenszel (MH) differential item functioning (DIF) statistic. Because the formula is general, it is appropriate for either uniform DIF (defined as a difference in item response theory item difficulty values) or nonuniform DIF; and it can be used regardless of the form of the item response function. In the case of uniform DIF modeled with two-parameter-logistic response functions, the parameter is well known to be linearly related to the difference in item difficulty between the focal and reference groups. Even though this relationship is known to not strictly hold true in the case of three-parameter-logistic (3PL) uniform DIE the degree of the departure from this relationship has not been known and has been generally believed to be small By evaluating the MH DIF parameter, we show that for items of medium or high difficulty, the parameter is much smaller in absolute value than expected based on the difference in item difficulty between the two groups. These results shed new light on results from previous simulation studies that showed the MH DIF statistic has a tendency to shrink toward zero with increasing difficulty level when used with 3PL data.

Download Full-text

The Matching Criterion Purification for Differential Item Functioning Analyses in a Large-Scale Assessment

Educational and Psychological Measurement ◽

10.1177/0013164415585166 ◽

2015 ◽

Vol 76 (1) ◽

pp. 141-163 ◽

Cited By ~ 1

Author(s):

HyeSun Lee ◽

Kurt F. Geisinger

Keyword(s):

Differential Item Functioning ◽

Large Scale ◽

Matching Criterion ◽

Large Scale Assessment ◽

Scale Assessment ◽

Item Functioning

Download Full-text

About the Equivalence of the Latent D-Scoring Model and the Two-Parameter Logistic Item Response Model

10.20944/preprints202105.0699.v1 ◽

2021 ◽

Author(s):

Alexander Robitzsch

Keyword(s):

Item Response ◽

Logistic Model ◽

Latent Trait ◽

Response Model ◽

Item Response Model ◽

Numerical Illustration ◽

Scoring Model ◽

Estimation Techniques ◽

Pisa Data ◽

Two Parameter

This article shows that the recently proposed latent D-scoring model of Dimitrov is statistically equivalent to the two-parameter logistic item response model. An analytical derivation and a numerical illustration are employed for demonstrating this finding. Hence, estimation techniques for the two-parameter logistic model can be used for estimating the latent D-scoring model. In an empirical example using PISA data, differences of country ranks are investigated when using different metrics for the latent trait. In the example, the choice of the latent trait metric matters for the ranking of countries. Finally, it is argued that an item response model with bounded latent trait values like the latent D-scoring model might have advantages for reporting results in terms of interpretation.

Download Full-text

A-14 An Updated Item Response Analysis of the Halstead Category Test

Archives of Clinical Neuropsychology ◽

10.1093/arclin/acz034.14 ◽

2019 ◽

Vol 34 (6) ◽

pp. 873-873

Author(s):

W Goette ◽

A Schmitt ◽

J Nici

Keyword(s):

Item Response ◽

Item Parameter ◽

Response Analysis ◽

Parameter Estimates ◽

Item Response Model ◽

Category Test ◽

Item Parameters ◽

Wide Range ◽

Halstead Category Test ◽

Two Parameter

Abstract Objective To identify item parameter estimates for the Halstead Category Test (HCT). Previous item response analyses have been conducted on the HCT but without implementing item response theory methods. Method Data were collected from a diagnostically heterogenous sample of 211 adults (110 males, 101 females) referred for neuropsychological evaluation. The sample had an average educational attainment of 14.18 years (SD = 3.05 years) and an average age of 59.75 years (SD = 18.28). Responses from items on Subtests III-VII were dichotomously coded (0 = incorrect, 1 = correct). A two-parameter, hierarchical, logistic item response model was fit to the data using code in Stan, which uses an adaptive variant of Hamiltonian Monte Carlo. Results The model converged appropriately with posterior estimates of item parameters all demonstrating adequate effective sample sizes (min. = 3485.74) and Rhat (max. = 1.002). The range of posterior difficulty estimates follows: -1.06-2.07 (III), -1.67-1.92 (IV), -3.80-2.62 (V), -2.35-4.38 (VI), and -2.28-1.80 (VII). The range of posterior discrimination estimates follows: 0.20-5.41 (III), 0.35-8.17 (IV), 0.11-4.14 (V), 0.69-5.88 (VI), and 0.53-2.83 (VII). Conclusions The HCT demonstrates a wide range of item difficulties with few items being excessively difficult, though some in this range were identified in Subtest VI. Ranges for item discriminations are also wide with some estimates returning high estimates, which may be related to the smaller sample size for a two-parameter model or may be due to less-than-ideal item functioning. These findings support the longstanding sensitivity of the HCT to a variety of neurological conditions and across the severity spectrum.

Download Full-text

About the Equivalence of the Latent D-Scoring Model and the Two-Parameter Logistic Item Response Model

Mathematics ◽

10.3390/math9131465 ◽

2021 ◽

Vol 9 (13) ◽

pp. 1465

Author(s):

Alexander Robitzsch

Keyword(s):

Item Response ◽

Logistic Model ◽

Latent Trait ◽

Response Model ◽

Item Response Model ◽

Numerical Illustration ◽

Scoring Model ◽

Estimation Techniques ◽

Pisa Data ◽

Two Parameter

Download Full-text

MARGINAL MAXIMUM LIKELIHOOD ESTIMATION OF ITEM PARAMETERS IN A GENERALIZED ITEM RESPONSE MODEL

The Japanese Journal of Educational Psychology ◽

10.5926/jjep1953.41.1_22 ◽

1993 ◽

Vol 41 (1) ◽

pp. 22-30

Author(s):

Tomoyasu NAKAMURA ◽

Shin-ichi MAYEKAWA

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimation ◽

Item Response ◽

Likelihood Estimation ◽

Response Model ◽

Marginal Maximum Likelihood ◽

Item Response Model ◽

Marginal Maximum Likelihood Estimation ◽

Item Parameters ◽

Marginal Maximum

Download Full-text

DIF DETECTION SENSITIVITY OF LORD’S CHI-SQUARE, RAJU’S AREA, LOGISTIC REGRESSION, MANTEL-HAENSZEL, STANDARDIZATION, AND TRANSFORMED ITEM DIFFICULTIES METHODS, IN COMPARISON, USING R.

EPRA International Journal of Multidisciplinary Research (IJMR) ◽

10.36713/epra7953 ◽

2021 ◽

pp. 629-640

Author(s):

Dr. Wokoma T. Abbott

Keyword(s):

Logistic Regression ◽

Item Response Theory ◽

Differential Item Functioning ◽

Item Response ◽

Reference Group ◽

Detection Sensitivity ◽

Detection Methods ◽

Chi Square ◽

Item Functioning ◽

Item Parameters

Differential item functioning (DIF) will always occur as a result of these differences in the person parameter of these individuals being examined even when item parameters remain constant during testing. This postulate of item response theory (IRT) was proven in this work. This study investigated if DIF detection methods will have the same DIF detection sensitivity. Comparative research design formed the framework of the study. Transformed item difficulties (TID), Mantel-Haenszel (MH), standardization, logistic regression, Ragu’s area, and Lord’s chi-square methods were compared. The study used 400 vocational one students (200 male as reference group and 200 female as focal group) in Rivers state, Nigeria. The multiple choice items of 2019 computer science for the junior school certificate examination (JSCE) was adapted as the instrument for data collection, which were administered to students and scored dichotomously. Difficulty and discrimination parameters of the items were analyzed using the 2PL model of IRT with the help of ltm package. Ogives of the items were plotted with ggplot2 package. Individual DIF methods and DichoDif in DifR were used to detect DIF and compare the methods. The results revealed that all the items of the test functioned differently between the reference group and the focal group as shown in the item characteristic curves (ICCs). In comparison of the DIF detection methods, standardization method detected most of the DIF items followed by logistic regression method, and then lord’s chi-square methods. Transformed item difficulties method detected more than mantel-Haenszel method. Raju’s area method could not detect any. In the light of the finding, it was recommended that the best DIF detection methods (possibly combination of them) should be used to identify DIF items in tests. KEYWORDS: Item response theory, differential item functioning, item characteristic curve, item parameters.

Download Full-text