scholarly journals A Comparison of Linking Methods for Two Groups for the Two-Parameter Logistic Item Response Model in the Presence and Absence of Random Differential Item Functioning

Foundations ◽  
2021 ◽  
Vol 1 (1) ◽  
pp. 116-144
Author(s):  
Alexander Robitzsch

This article investigates the comparison of two groups based on the two-parameter logistic item response model. It is assumed that there is random differential item functioning in item difficulties and item discriminations. The group difference is estimated using separate calibration with subsequent linking, as well as concurrent calibration. The following linking methods are compared: mean-mean linking, log-mean-mean linking, invariance alignment, Haberman linking, asymmetric and symmetric Haebara linking, different recalibration linking methods, anchored item parameters, and concurrent calibration. It is analytically shown that log-mean-mean linking and mean-mean linking provide consistent estimates if random DIF effects have zero means. The performance of the linking methods was evaluated through a simulation study. It turned out that (log-)mean-mean and Haberman linking performed best, followed by symmetric Haebara linking and a newly proposed recalibration linking method. Interestingly, linking methods frequently found in applications (i.e., asymmetric Haebara linking, recalibration linking used in a variant in current large-scale assessment studies, anchored item parameters, concurrent calibration) perform worse in the presence of random differential item functioning. In line with the previous literature, differences between linking methods turned out be negligible in the absence of random differential item functioning. The different linking methods were also applied in an empirical example that performed a linking of PISA 2006 to PISA 2009 for Austrian students. This application showed that estimated trends in the means and standard deviations depended on the chosen linking method and the employed item response model.

2019 ◽  
Vol 80 (3) ◽  
pp. 604-612
Author(s):  
Tenko Raykov ◽  
George A. Marcoulides

This note raises caution that a finding of a marked pseudo-guessing parameter for an item within a three-parameter item response model could be spurious in a population with substantial unobserved heterogeneity. A numerical example is presented wherein each of two classes the two-parameter logistic model is used to generate the data on a multi-item measuring instrument, while the three-parameter logistic model is found to be associated with a considerable pseudo-guessing parameter estimate on an item. The implications of the reported results for empirical educational research are subsequently discussed.


1999 ◽  
Vol 24 (3) ◽  
pp. 293-322 ◽  
Author(s):  
Louis A. Roussos ◽  
Deborah L. Schnipke ◽  
Peter J. Pashley

The present study derives a general formula for the population parameter being estimated by the Mantel-Haenszel (MH) differential item functioning (DIF) statistic. Because the formula is general, it is appropriate for either uniform DIF (defined as a difference in item response theory item difficulty values) or nonuniform DIF; and it can be used regardless of the form of the item response function. In the case of uniform DIF modeled with two-parameter-logistic response functions, the parameter is well known to be linearly related to the difference in item difficulty between the focal and reference groups. Even though this relationship is known to not strictly hold true in the case of three-parameter-logistic (3PL) uniform DIE the degree of the departure from this relationship has not been known and has been generally believed to be small By evaluating the MH DIF parameter, we show that for items of medium or high difficulty, the parameter is much smaller in absolute value than expected based on the difference in item difficulty between the two groups. These results shed new light on results from previous simulation studies that showed the MH DIF statistic has a tendency to shrink toward zero with increasing difficulty level when used with 3PL data.


Author(s):  
Alexander Robitzsch

This article shows that the recently proposed latent D-scoring model of Dimitrov is statistically equivalent to the two-parameter logistic item response model. An analytical derivation and a numerical illustration are employed for demonstrating this finding. Hence, estimation techniques for the two-parameter logistic model can be used for estimating the latent D-scoring model. In an empirical example using PISA data, differences of country ranks are investigated when using different metrics for the latent trait. In the example, the choice of the latent trait metric matters for the ranking of countries. Finally, it is argued that an item response model with bounded latent trait values like the latent D-scoring model might have advantages for reporting results in terms of interpretation.


2019 ◽  
Vol 34 (6) ◽  
pp. 873-873
Author(s):  
W Goette ◽  
A Schmitt ◽  
J Nici

Abstract Objective To identify item parameter estimates for the Halstead Category Test (HCT). Previous item response analyses have been conducted on the HCT but without implementing item response theory methods. Method Data were collected from a diagnostically heterogenous sample of 211 adults (110 males, 101 females) referred for neuropsychological evaluation. The sample had an average educational attainment of 14.18 years (SD = 3.05 years) and an average age of 59.75 years (SD = 18.28). Responses from items on Subtests III-VII were dichotomously coded (0 = incorrect, 1 = correct). A two-parameter, hierarchical, logistic item response model was fit to the data using code in Stan, which uses an adaptive variant of Hamiltonian Monte Carlo. Results The model converged appropriately with posterior estimates of item parameters all demonstrating adequate effective sample sizes (min. = 3485.74) and Rhat (max. = 1.002). The range of posterior difficulty estimates follows: -1.06-2.07 (III), -1.67-1.92 (IV), -3.80-2.62 (V), -2.35-4.38 (VI), and -2.28-1.80 (VII). The range of posterior discrimination estimates follows: 0.20-5.41 (III), 0.35-8.17 (IV), 0.11-4.14 (V), 0.69-5.88 (VI), and 0.53-2.83 (VII). Conclusions The HCT demonstrates a wide range of item difficulties with few items being excessively difficult, though some in this range were identified in Subtest VI. Ranges for item discriminations are also wide with some estimates returning high estimates, which may be related to the smaller sample size for a two-parameter model or may be due to less-than-ideal item functioning. These findings support the longstanding sensitivity of the HCT to a variety of neurological conditions and across the severity spectrum.


Mathematics ◽  
2021 ◽  
Vol 9 (13) ◽  
pp. 1465
Author(s):  
Alexander Robitzsch

This article shows that the recently proposed latent D-scoring model of Dimitrov is statistically equivalent to the two-parameter logistic item response model. An analytical derivation and a numerical illustration are employed for demonstrating this finding. Hence, estimation techniques for the two-parameter logistic model can be used for estimating the latent D-scoring model. In an empirical example using PISA data, differences of country ranks are investigated when using different metrics for the latent trait. In the example, the choice of the latent trait metric matters for the ranking of countries. Finally, it is argued that an item response model with bounded latent trait values like the latent D-scoring model might have advantages for reporting results in terms of interpretation.


Author(s):  
Dr. Wokoma T. Abbott

Differential item functioning (DIF) will always occur as a result of these differences in the person parameter of these individuals being examined even when item parameters remain constant during testing. This postulate of item response theory (IRT) was proven in this work. This study investigated if DIF detection methods will have the same DIF detection sensitivity. Comparative research design formed the framework of the study. Transformed item difficulties (TID), Mantel-Haenszel (MH), standardization, logistic regression, Ragu’s area, and Lord’s chi-square methods were compared. The study used 400 vocational one students (200 male as reference group and 200 female as focal group) in Rivers state, Nigeria. The multiple choice items of 2019 computer science for the junior school certificate examination (JSCE) was adapted as the instrument for data collection, which were administered to students and scored dichotomously. Difficulty and discrimination parameters of the items were analyzed using the 2PL model of IRT with the help of ltm package. Ogives of the items were plotted with ggplot2 package. Individual DIF methods and DichoDif in DifR were used to detect DIF and compare the methods. The results revealed that all the items of the test functioned differently between the reference group and the focal group as shown in the item characteristic curves (ICCs). In comparison of the DIF detection methods, standardization method detected most of the DIF items followed by logistic regression method, and then lord’s chi-square methods. Transformed item difficulties method detected more than mantel-Haenszel method. Raju’s area method could not detect any. In the light of the finding, it was recommended that the best DIF detection methods (possibly combination of them) should be used to identify DIF items in tests. KEYWORDS: Item response theory, differential item functioning, item characteristic curve, item parameters.


Sign in / Sign up

Export Citation Format

Share Document