scholarly journals Evaluating Different Equating Setups in the Continuous Item Pool Calibration for Computerized Adaptive Testing

2019 ◽  
Vol 10 ◽  
Author(s):  
Sebastian Born ◽  
Aron Fink ◽  
Christian Spoden ◽  
Andreas Frey
2021 ◽  
pp. 073428292110277
Author(s):  
Ioannis Tsaousis ◽  
Georgios D. Sideridis ◽  
Hannan M. AlGhamdi

This study evaluated the psychometric quality of a computerized adaptive testing (CAT) version of the general cognitive ability test (GCAT), using a simulation study protocol put forth by Han, K. T. (2018a). For the needs of the analysis, three different sets of items were generated, providing an item pool of 165 items. Before evaluating the efficiency of the GCAT, all items in the final item pool were linked (equated), following a sequential approach. Data were generated using a standard normal for 10,000 virtual individuals ( M = 0 and SD = 1). Using the measure’s 165-item bank, the ability value (θ) for each participant was estimated. maximum Fisher information (MFI) and maximum likelihood estimation with fences (MLEF) were used as item selection and score estimation methods, respectively. For item exposure control, the fade away method (FAM) was preferred. The termination criterion involved a minimum SE ≤ 0.33. The study revealed that the average number of items administered for 10,000 participants was 15. Moreover, the precision level in estimating the participant’s ability score was very high, as demonstrated by the CBIAS, CMAE, and CRMSE). It is concluded that the CAT version of the test is a promising alternative to administering the corresponding full-length measure since it reduces the number of administered items, prevents high rates of item exposure, and provides accurate scores with minimum measurement error.


2020 ◽  
Author(s):  
Menghua She ◽  
Yaling Li ◽  
Dongbo Tu ◽  
Yan Cai

Abstract Background: As more and more people suffer from sleep disorders, developing an efficient, cheap and accurate assessment tool for screening sleep disorders is becoming more urgent. This study developed a computerized adaptive testing for sleep disorders (CAT-SD). Methods: A large sample of 1,304 participants was recruited to construct the item pool of CAT-SD and to investigate the psychometric characteristics of CAT-SD. More specifically, firstly the analyses of unidimensionality, model fit, item fit, item discrimination parameter and differential item functioning (DIF) were conducted to construct a final item pool which meets the requirements of item response theory (IRT) measurement. In addition, a simulated CAT study with real response data of participants was performed to investigate the psychometric characteristics of CAT-SD, including reliability, validity and predictive utility (sensitivity and specificity). Results: The final unidimensional item bank of the CAT-SD not only had good item fit, high discrimination and no DIF; Moreover, it had acceptable reliability, validity and predictive utility. Conclusions: The CAT-SD could be used as an effective and accurate assessment tool for measuring individuals' severity of the sleep disorders and offers a bran-new perspective for screening of sleep disorders with psychological scales.


2021 ◽  
pp. 001316442110238
Author(s):  
Chansoon Lee ◽  
Hong Qian

Using classical test theory and item response theory, this study applied sequential procedures to a real operational item pool in a variable-length computerized adaptive testing (CAT) to detect items whose security may be compromised. Moreover, this study proposed a hybrid threshold approach to improve the detection power of the sequential procedure while controlling the Type I error rate. The hybrid threshold approach uses a local threshold for each item in an early stage of the CAT administration, and then it uses the global threshold in the decision-making stage. Applying various simulation factors, a series of simulation studies examined which factors contribute significantly to the power rate and lag time of the procedure. In addition to the simulation study, a case study investigated whether the procedures are applicable to the real item pool administered in CAT and can identify potentially compromised items in the pool. This research found that the increment of probability of a correct answer ( p-increment) was the simulation factor most important to the sequential procedures’ ability to detect compromised items. This study also found that the local threshold approach improved power rates and shortened lag times when the p-increment was small. The findings of this study could help practitioners implement the sequential procedures using the hybrid threshold approach in real-time CAT administration.


2021 ◽  
Vol 11 ◽  
Author(s):  
Yaling Li ◽  
Menghua She ◽  
Dongbo Tu ◽  
Yan Cai

As schizotypal personality disorder (SPD) increasingly prevails in the general population, a rapid and comprehensive measurement instrument is imperative to screen individuals at risk for SPD. To address this issue, we aimed to develop a computerized adaptive testing for SPD (CAT-SPD) using a non-clinical Chinese sample (N = 999), consisting of a calibration sample (N1 = 497) and a validation sample (N2 = 502). The item pool of SPD was constructed from several widely used SPD scales and statistical analyses based on the item response theory (IRT) via a calibration sample using a graded response model (GRM). Finally, 90 items, which measured at least one symptom of diagnostic criteria of SPD in the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) and had local independence, good item fit, high slope, and no differential item functioning (DIF), composed the final item pool for the CAT-SPD. In addition, a simulated CAT was conducted in an independent validation sample to assess the performance of the CAT-SPD. Results showed that the CAT-SPD not only had acceptable reliability, validity, and predictive utility but also had shorter but efficient assessment of SPD which can save significant time and reduce the test burden of individuals with less information loss.


2018 ◽  
Vol 79 (2) ◽  
pp. 335-357 ◽  
Author(s):  
Chuan-Ju Lin ◽  
Hua-Hua Chang

For item selection in cognitive diagnostic computerized adaptive testing (CD-CAT), ideally, a single item selection index should be created to simultaneously regulate precision, exposure status, and attribute balancing. For this purpose, in this study, we first proposed an attribute-balanced item selection criterion, namely, the standardized weighted deviation global discrimination index (SWDGDI), and subsequently formulated the constrained progressive index (CP_SWDGDI) by casting the SWDGDI in a progressive algorithm. A simulation study revealed that the SWDGDI method was effective in balancing attribute coverage and the CP_SWDGDI method was able to simultaneously balance attribute coverage and item pool usage while maintaining acceptable estimation precision. This research also demonstrates the advantage of a relatively low number of attributes in CD-CAT applications.


1999 ◽  
Vol 15 (2) ◽  
pp. 91-98 ◽  
Author(s):  
Lutz F. Hornke

Summary: Item parameters for several hundreds of items were estimated based on empirical data from several thousands of subjects. The logistic one-parameter (1PL) and two-parameter (2PL) model estimates were evaluated. However, model fit showed that only a subset of items complied sufficiently, so that the remaining ones were assembled in well-fitting item banks. In several simulation studies 5000 simulated responses were generated in accordance with a computerized adaptive test procedure along with person parameters. A general reliability of .80 or a standard error of measurement of .44 was used as a stopping rule to end CAT testing. We also recorded how often each item was used by all simulees. Person-parameter estimates based on CAT correlated higher than .90 with true values simulated. For all 1PL fitting item banks most simulees used more than 20 items but less than 30 items to reach the pre-set level of measurement error. However, testing based on item banks that complied to the 2PL revealed that, on average, only 10 items were sufficient to end testing at the same measurement error level. Both clearly demonstrate the precision and economy of computerized adaptive testing. Empirical evaluations from everyday uses will show whether these trends will hold up in practice. If so, CAT will become possible and reasonable with some 150 well-calibrated 2PL items.


Sign in / Sign up

Export Citation Format

Share Document