scholarly journals simulate_CAT: A Computer Program for Post-Hoc Simulation for Computerized Adaptive Testing

Author(s):  
İlker KALENDER
SAGE Open ◽  
2020 ◽  
Vol 10 (1) ◽  
pp. 215824401989904
Author(s):  
Wenyi Wang ◽  
Lihong Song ◽  
Teng Wang ◽  
Peng Gao ◽  
Jian Xiong

The purpose of this study is to investigate the relationship between the Shannon entropy procedure and the Jensen–Shannon divergence (JSD) that are used as item selection criteria in cognitive diagnostic computerized adaptive testing (CD-CAT). Because the JSD itself is defined by the Shannon entropy, we apply the well-known relationship between the JSD and Shannon entropy to establish a relationship between the item selection criteria that are based on these two measures. To understand the relationship between these two item selection criteria better, an alternative way is also provided. Theoretical derivations and empirical examples have shown that the Shannon entropy procedure and the JSD in CD-CAT have a linear relation under cognitive diagnostic models. Consistent with our theoretical conclusions, simulation results have shown that two item selection criteria behaved quite similarly in terms of attribute-level and pattern recovery rates under all conditions and they selected the same set of items for each examinee from an item bank with item parameters drawn from a uniform distribution U(0.1, 0.3) under post hoc simulations. We provide some suggestions for future studies and a discussion of relationship between the modified posterior-weighted Kullback–Leibler index and the G-DINA (generalized deterministic inputs, noisy “and” gate) discrimination index.


Author(s):  
Dong Gi Seo ◽  
Jeongwook Choi

Purpose: Computerized adaptive testing (CAT) has been adopted in licensing examinations because it improves the efficiency and accuracy of the tests, as shown in many studies. This simulation study investigated CAT scoring and item selection methods for the Korean Medical Licensing Examination (KMLE). Methods: This study used a post-hoc (real data) simulation design. The item bank used in this study included all items from the January 2017 KMLE. All CAT algorithms for this study were implemented using the ‘catR’ package in the R program. Results: In terms of accuracy, the Rasch and 2-parametric logistic (PL) models performed better than the 3PL model. The ‘modal a posteriori’ and ‘expected a posterior’ methods provided more accurate estimates than maximum likelihood estimation or weighted likelihood estimation. Furthermore, maximum posterior weighted information and minimum expected posterior variance performed better than other item selection methods. In terms of efficiency, the Rasch model is recommended to reduce test length. Conclusion: Before implementing live CAT, a simulation study should be performed under varied test conditions. Based on a simulation study, and based on the results, specific scoring and item selection methods should be predetermined.


1999 ◽  
Vol 15 (2) ◽  
pp. 91-98 ◽  
Author(s):  
Lutz F. Hornke

Summary: Item parameters for several hundreds of items were estimated based on empirical data from several thousands of subjects. The logistic one-parameter (1PL) and two-parameter (2PL) model estimates were evaluated. However, model fit showed that only a subset of items complied sufficiently, so that the remaining ones were assembled in well-fitting item banks. In several simulation studies 5000 simulated responses were generated in accordance with a computerized adaptive test procedure along with person parameters. A general reliability of .80 or a standard error of measurement of .44 was used as a stopping rule to end CAT testing. We also recorded how often each item was used by all simulees. Person-parameter estimates based on CAT correlated higher than .90 with true values simulated. For all 1PL fitting item banks most simulees used more than 20 items but less than 30 items to reach the pre-set level of measurement error. However, testing based on item banks that complied to the 2PL revealed that, on average, only 10 items were sufficient to end testing at the same measurement error level. Both clearly demonstrate the precision and economy of computerized adaptive testing. Empirical evaluations from everyday uses will show whether these trends will hold up in practice. If so, CAT will become possible and reasonable with some 150 well-calibrated 2PL items.


Methodology ◽  
2007 ◽  
Vol 3 (1) ◽  
pp. 14-23 ◽  
Author(s):  
Juan Ramon Barrada ◽  
Julio Olea ◽  
Vicente Ponsoda

Abstract. The Sympson-Hetter (1985) method provides a means of controlling maximum exposure rate of items in Computerized Adaptive Testing. Through a series of simulations, control parameters are set that mark the probability of administration of an item on being selected. This method presents two main problems: it requires a long computation time for calculating the parameters and the maximum exposure rate is slightly above the fixed limit. Van der Linden (2003) presented two alternatives which appear to solve both of the problems. The impact of these methods in the measurement accuracy has not been tested yet. We show how these methods over-restrict the exposure of some highly discriminating items and, thus, the accuracy is decreased. It also shown that, when the desired maximum exposure rate is near the minimum possible value, these methods offer an empirical maximum exposure rate clearly above the goal. A new method, based on the initial estimation of the probability of administration and the probability of selection of the items with the restricted method ( Revuelta & Ponsoda, 1998 ), is presented in this paper. It can be used with the Sympson-Hetter method and with the two van der Linden's methods. This option, when used with Sympson-Hetter, speeds the convergence of the control parameters without decreasing the accuracy.


Sign in / Sign up

Export Citation Format

Share Document