Computerized Adaptive Testing for Public Opinion Surveys

Jacob M. Montgomery; Josh Cutler

doi:10.1093/pan/mps060

Computerized Adaptive Testing for Public Opinion Surveys

Political Analysis ◽

10.1093/pan/mps060 ◽

2013 ◽

Vol 21 (2) ◽

pp. 172-192 ◽

Cited By ~ 12

Author(s):

Jacob M. Montgomery ◽

Josh Cutler

Keyword(s):

Public Opinion ◽

Measurement Accuracy ◽

Computerized Adaptive Testing ◽

Political Knowledge ◽

Adaptive Testing ◽

Measurement Precision ◽

Public Opinion Surveys ◽

Accuracy And Precision ◽

Latent Dimension ◽

Latent Traits

Survey researchers avoid using large multi-item scales to measure latent traits due to both the financial costs and the risk of driving up nonresponse rates. Typically, investigators select a subset of available scale items rather than asking the full battery. Reduced batteries, however, can sharply reduce measurement precision and introduce bias. In this article, we present computerized adaptive testing (CAT) as a method for minimizing the number of questions each respondent must answer while preserving measurement accuracy and precision. CAT algorithms respond to individuals' previous answers to select subsequent questions that most efficiently reveal respondents' positions on a latent dimension. We introduce the basic stages of a CAT algorithm and present the details for one approach to item selection appropriate for public opinion research. We then demonstrate the advantages of CAT via simulation and empirically comparing dynamic and static measures of political knowledge.

Download Full-text

An Item Response Theory–Based, Computerized Adaptive Testing Version of the MacArthur–Bates Communicative Development Inventory: Words & Sentences (CDI:WS)

Journal of Speech Language and Hearing Research ◽

10.1044/2015_jslhr-l-15-0202 ◽

2016 ◽

Vol 59 (2) ◽

pp. 281-289 ◽

Cited By ~ 5

Author(s):

Guido Makransky ◽

Philip S. Dale ◽

Philip Havmose ◽

Dorthe Bleses

Keyword(s):

Item Response Theory ◽

Item Response ◽

Computerized Adaptive Testing ◽

Real Data ◽

Adaptive Testing ◽

Measurement Precision ◽

Response Theory ◽

Irt Model ◽

Communicative Development ◽

Accuracy And Precision

Purpose This study investigated the feasibility and potential validity of an item response theory (IRT)–based computerized adaptive testing (CAT) version of the MacArthur–Bates Communicative Development Inventory: Words & Sentences (CDI:WS; Fenson et al., 2007) vocabulary checklist, with the objective of reducing length while maintaining measurement precision. Method Parent-reported vocabulary for the American CDI:WS norming sample consisting of 1,461 children between the ages of 16 and 30 months was used to investigate the fit of the items to the 2-parameter logistic IRT model and to simulate CDI-CAT versions with 400, 200, 100, 50, 25, 10, and 5 items. Results All but 14 items fit the 2-parameter logistic IRT model, and real data simulations of CDI-CATs with at least 50 items recovered full CDI scores with correlations over .95. Furthermore, the CDI-CATs with at least 50 items had similar correlations with age and socioeconomic status as the full CDI:WS. Conclusion These results provide strong evidence that a CAT version of the CDI:WS has the potential to reduce length while maintaining the accuracy and precision of the full instrument.

Download Full-text

Methods for Restricting Maximum Exposure Rate in Computerized Adaptative Testing

Methodology ◽

10.1027/1614-2241.3.1.14 ◽

2007 ◽

Vol 3 (1) ◽

pp. 14-23 ◽

Cited By ~ 9

Author(s):

Juan Ramon Barrada ◽

Julio Olea ◽

Vicente Ponsoda

Keyword(s):

Measurement Accuracy ◽

Computerized Adaptive Testing ◽

Computation Time ◽

Adaptive Testing ◽

Exposure Rate ◽

Control Parameters ◽

The Impact ◽

Two Alternatives ◽

Selection Of ◽

Maximum Exposure

Abstract. The Sympson-Hetter (1985) method provides a means of controlling maximum exposure rate of items in Computerized Adaptive Testing. Through a series of simulations, control parameters are set that mark the probability of administration of an item on being selected. This method presents two main problems: it requires a long computation time for calculating the parameters and the maximum exposure rate is slightly above the fixed limit. Van der Linden (2003) presented two alternatives which appear to solve both of the problems. The impact of these methods in the measurement accuracy has not been tested yet. We show how these methods over-restrict the exposure of some highly discriminating items and, thus, the accuracy is decreased. It also shown that, when the desired maximum exposure rate is near the minimum possible value, these methods offer an empirical maximum exposure rate clearly above the goal. A new method, based on the initial estimation of the probability of administration and the probability of selection of the items with the restricted method ( Revuelta & Ponsoda, 1998 ), is presented in this paper. It can be used with the Sympson-Hetter method and with the two van der Linden's methods. This option, when used with Sympson-Hetter, speeds the convergence of the control parameters without decreasing the accuracy.

Download Full-text

Measurement Precision and Efficiency of Computerized Adaptive Testing for the Activities-specific Balance Confidence Scale in People With Stroke

Physical Therapy ◽

10.1093/ptj/pzab020 ◽

2021 ◽

Author(s):

Bryant A Seamon ◽

Steven A Kautz ◽

Craig A Velozo

Keyword(s):

Rasch Model ◽

Computerized Adaptive Testing ◽

Adaptive Testing ◽

Measurement Precision ◽

Strongly Correlated ◽

Computerized Adaptive Test ◽

Balance Confidence ◽

Adaptive Test ◽

The Rasch Model ◽

Confidence Scale

Abstract Objective Administrative burden often prevents clinical assessment of balance confidence in people with stroke. A computerized adaptive test (CAT) version of the Activities-specific Balance Confidence Scale (ABC CAT) can dramatically reduce this burden. The objective of this study was to test balance confidence measurement precision and efficiency in people with stroke with an ABC CAT. Methods We conducted a retrospective cross-sectional simulation study with data from 406 adults approximately 2-months post-stroke in the Locomotor-Experience Applied Post-Stroke (LEAPS) trial. Item parameters for CAT calibration were estimated with the Rasch model using a random sample of participants (n = 203). Computer simulation was used with response data from remaining 203 participants to evaluate the ABC CAT algorithm under varying stopping criteria. We compared estimated levels of balance confidence from each simulation to actual levels predicted from the Rasch model (Pearson correlations and mean standard error (SE)). Results Results from simulations with number of items as a stopping criterion strongly correlated with actual ABC scores (full item, r = 1, 12-item, r = 0.994; 8-item, r = 0.98; 4-item, r = 0.929). Mean SE increased with decreasing number of items administered (full item, SE = 0.31; 12-item, SE = 0.33; 8-item, SE = 0.38; 4-item, SE = 0.49). A precision-based stopping rule (mean SE = 0.5) also strongly correlated with actual ABC scores (r = .941) and optimized the relationship between number of items administrated with precision (mean number of items 4.37, range [4–9]). Conclusions An ABC CAT can determine accurate and precise measures of balance confidence in people with stroke with as few as 4 items. Individuals with lower balance confidence may require a greater number of items (up to 9) and attributed to the LEAPS trial excluding more functionally impaired persons. Impact Statement Computerized adaptive testing can drastically reduce the ABC’s test administration time while maintaining accuracy and precision. This should greatly enhance clinical utility, facilitating adoption of clinical practice guidelines in stroke rehabilitation. Lay Summary If you have had a stroke, your physical therapist will likely test your balance confidence. A computerized adaptive test version of the ABC scale can accurately identify balance with as few as 4 questions, which takes much less time.

Download Full-text

The Impact of Item Calibration Error on Variable-Length Cognitive Diagnostic Computerized Adaptive Testing

Frontiers in Psychology ◽

10.3389/fpsyg.2020.575141 ◽

2020 ◽

Vol 11 ◽

Author(s):

Xiaojian Sun ◽

Yanlou Liu ◽

Tao Xin ◽

Naiqing Song

Keyword(s):

Adverse Effects ◽

Measurement Accuracy ◽

Computerized Adaptive Testing ◽

Adaptive Testing ◽

Unified Model ◽

Variable Length ◽

Calibration Error ◽

Test Length ◽

Test Efficiency ◽

Item Parameters

Calibration errors are inevitable and should not be ignored during the estimation of item parameters. Items with calibration error can affect the measurement results of tests. One of the purposes of the current study is to investigate the impacts of the calibration errors during the estimation of item parameters on the measurement accuracy, average test length, and test efficiency for variable-length cognitive diagnostic computerized adaptive testing. The other purpose is to examine the methods for reducing the adverse effects of calibration errors. Simulation results show that (1) calibration error has negative effect on the measurement accuracy for the deterministic input, noisy “and” gate (DINA) model, and the reduced reparameterized unified model; (2) the average test lengths is shorter, and the test efficiency is overestimated for items with calibration errors; (3) the compensatory reparameterized unified model (CRUM) is less affected by the calibration errors, and the classification accuracy, average test length, and test efficiency are slightly stable in the CRUM framework; (4) methods such as improving the quality of items, using large calibration sample to calibrate the parameters of items, as well as using cross-validation method can reduce the adverse effects of calibration errors on CD-CAT.

Download Full-text

Applying Computerized Adaptive Testing to the Four-Dimensional Symptom Questionnaire (4DSQ): A Simulation Study

JMIR Mental Health ◽

10.2196/mental.6545 ◽

2017 ◽

Vol 4 (1) ◽

pp. e7 ◽

Cited By ~ 1

Author(s):

Tessa Magnée ◽

Derek P de Beurs ◽

Berend Terluin ◽

Peter F Verhaak

Keyword(s):

Mental Health ◽

General Practice ◽

Simulation Study ◽

Computerized Adaptive Testing ◽

Mental Health Problems ◽

Adaptive Testing ◽

Health Problems ◽

Measurement Precision ◽

Stopping Rule ◽

Symptom Questionnaire

Background Efficient screening questionnaires are useful in general practice. Computerized adaptive testing (CAT) is a method to improve the efficiency of questionnaires, as only the items that are particularly informative for a certain responder are dynamically selected. Objective The objective of this study was to test whether CAT could improve the efficiency of the Four-Dimensional Symptom Questionnaire (4DSQ), a frequently used self-report questionnaire designed to assess common psychosocial problems in general practice. Methods A simulation study was conducted using a sample of Dutch patients visiting a general practitioner (GP) with psychological problems (n=379). Responders completed a paper-and-pencil version of the 50-item 4DSQ and a psychometric evaluation was performed to check if the data agreed with item response theory (IRT) assumptions. Next, a CAT simulation was performed for each of the four 4DSQ scales (distress, depression, anxiety, and somatization), based on the given responses as if they had been collected through CAT. The following two stopping rules were applied for the administration of items: (1) stop if measurement precision is below a predefined level, or (2) stop if more than half of the items of the subscale are administered. Results In general, the items of each of the four scales agreed with IRT assumptions. Application of the first stopping rule reduced the length of the questionnaire by 38% (from 50 to 31 items on average). When the second stopping rule was also applied, the total number of items could be reduced by 56% (from 50 to 22 items on average). Conclusions CAT seems useful for improving the efficiency of the 4DSQ by 56% without losing a considerable amount of measurement precision. The CAT version of the 4DSQ may be useful as part of an online assessment to investigate the severity of mental health problems of patients visiting a GP. This simulation study is the first step needed for the development a CAT version of the 4DSQ. A CAT version of the 4DSQ could be of high value for Dutch GPs since increasing numbers of patients with mental health problems are visiting the general practice. In further research, the results of a real-time CAT should be compared with the results of the administration of the full scale.

Download Full-text

The Effects of Item Exposure Control on Measurement Precision of Vocabulary Size Estimates in Computerized Adaptive Testing

English Teaching & Learning ◽

10.1007/s42321-020-00068-w ◽

2021 ◽

Author(s):

Wen-Ta Tseng

Keyword(s):

Computerized Adaptive Testing ◽

Adaptive Testing ◽

Measurement Precision ◽

Exposure Control ◽

Vocabulary Size ◽

Item Exposure ◽

Item Exposure Control ◽

Size Estimates

Download Full-text

Binary Restrictive Threshold Method for Item Exposure Control in Cognitive Diagnostic Computerized Adaptive Testing

Frontiers in Psychology ◽

10.3389/fpsyg.2021.517155 ◽

2021 ◽

Vol 12 ◽

Author(s):

Xiaojian Sun ◽

Yizhu Gao ◽

Tao Xin ◽

Naiqing Song

Keyword(s):

Classification Accuracy ◽

Measurement Accuracy ◽

Computerized Adaptive Testing ◽

Critical Issue ◽

Adaptive Testing ◽

Exposure Control ◽

Threshold Method ◽

Item Exposure ◽

Item Exposure Control ◽

Better Than

Although classification accuracy is a critical issue in cognitive diagnostic computerized adaptive testing, attention has increasingly shifted to item exposure control to ensure test security. In this study, we developed the binary restrictive threshold (BRT) method to balance measurement accuracy and item exposure. In addition, a simulation study was conducted to evaluate its performance. The results indicated that the BRT method performed better than the restrictive progressive (RP) and stratified dynamic binary searching (SDBS) approaches but worse than the restrictive threshold (RT) method in terms of classification accuracy. With respect to item exposure control, the BRT method exhibited noticeably stronger performance compared with the RT method, even though its performance was not as high as that of the RP and SDBS methods.

Download Full-text

The Effect of Item Exposure Control Methods on Measurement Precision and Test Security under Different Measurement Conditions in Computerized Adaptive Testing

TED EĞİTİM VE BİLİM ◽

10.15390/eb.2020.8256 ◽

2020 ◽

Author(s):

Recep Gür ◽

H. Deniz Gülleroğlu

Keyword(s):

Computerized Adaptive Testing ◽

Adaptive Testing ◽

Measurement Precision ◽

Control Methods ◽

Exposure Control ◽

Test Security ◽

Item Exposure ◽

Item Exposure Control

Download Full-text

Comparison of Different Test Termination Rules in Terms of Measurement Precision in Computerized Adaptive Testing

Proceedings of The International Conference on Research in Teaching and Education ◽

10.33422/rteconf.2019.06.334 ◽

2019 ◽

Author(s):

Arzu Uçar ◽

Ebru Balta

Keyword(s):

Computerized Adaptive Testing ◽

Adaptive Testing ◽

Measurement Precision

Download Full-text

A new approach to assessing shyness of college students using computerized adaptive testing: CAT-Shyness

Journal of Pacific Rim Psychology ◽

10.1017/prp.2020.15 ◽

2020 ◽

Vol 14 ◽

Author(s):

Zifei Li ◽

Yan Cai ◽

Dongbo Tu

Keyword(s):

Psychometric Properties ◽

Sensitivity And Specificity ◽

Computerized Adaptive Testing ◽

Real Data ◽

Adaptive Testing ◽

Measurement Precision ◽

Item Bank ◽

Test Time ◽

Local Independence ◽

Computerized Adaptive Test

Abstract Assessing shy symptoms via computerized adaptive testing (CAT) provides greater measurement precision coupled with a lower test burden compared to conventional tests. The computerized adaptive test for shyness (CAT-Shyness) was developed based on a large sample of 1400 participants from China. Item bank development included the investigation of unidimensionality, local independence, and exploration of differential item functioning (DIF). CAT simulations based on the real data were carried out to investigate the reliability, validity, and predicted utility (sensitivity and specificity) of the CAT-Shyness. The CAT-Shyness item bank was successfully built and proved to have excellent psychometric properties: high content validity, unidimensionality, local independence, and no DIF. The CAT simulations needed 14 items to achieve a high measurement precision with a reliability of .9. Moreover, the results revealed that the proposed CAT-Shyness had acceptable and reasonable marginal reliability, criterion-related validity, and sensitivity and specificity. It not only had acceptable psychometric properties, but also had a shorter but efficient assessment of shyness, which can save significant test time and reduce the test burden for individuals with less information loss.

Download Full-text