test information function
Recently Published Documents


TOTAL DOCUMENTS

26
(FIVE YEARS 4)

H-INDEX

4
(FIVE YEARS 0)

Assessment ◽  
2021 ◽  
pp. 107319112110460
Author(s):  
Clark McKown ◽  
Maria Kharitonova ◽  
Nicole M. Russo-Ponsaran ◽  
Beyza Aksu-Dunya

This article describes the development and validation of a shortened form of SELweb EE, a web-based assessment for social–emotional skills in the early elementary grades. Using a Rasch approach, in the first study, we used data from two archival data sets to reduce the number of items in three subtests to create short forms that maintained item fit, item difficulty, item discrimination, and test information function range. In the second study, we created and administered a short form of SELweb EE to a demographically diverse cross-validation sample of 22,683 students. We evaluated the shortened assessment subtests’ score reliability, fit to a hypothesized factor structure, and association with age and other variables to evaluate criterion-related validity. Findings from this study suggest that score reliabilities, factor structure, and criterion-related validity for the short form are similar to corresponding properties of the long form. In addition, using a confirmatory factor analysis framework, the short form of SELweb EE demonstrated evidence of configural, metric, and scalar invariance across sex, ethnicity, and language. Shortening SELweb EE reduced the mean administration duration from 36 to 24 minutes. This reduction substantially increases its usability and feasibility while maintaining its psychometric merit.


Author(s):  
Elizabeth O’Nions ◽  
Francesca Happé ◽  
Essi Viding ◽  
Ilse Noens

Abstract Objectives Extreme/“pathological” demand avoidance (PDA) describes a presentation found in some children on the autism spectrum, characterized by obsessive resistance to everyday demands and requests. Demands often trigger avoidance behavior (e.g., distraction, excuses, withdrawal into role play). Pressure to comply can lead to escalation in emotional reactivity and behavior that challenges. Methods Previously, the Extreme Demand Avoidance Questionnaire (EDA-Q) was developed to quantify resemblance to clinical accounts of PDA from caregiver reports. The aim of this study was to refine the EDA-Q using principal components analysis (PCA) and item response theory (IRT) analysis on parent/caregiver-report data from 334 children with ASD aged 5–17 years. Results PCA and IRT analyses identified eight items that are discriminating indices of EDA traits, and behave similarly with respect to quantifying EDA irrespective of child age, gender, reported academic level, or reported independence in daily living activities. The “EDA-8” showed good internal consistency (Cronbach’s alpha = .90) and convergent and divergent validity with other measures (some of which were only available for a subsample of 233 respondents). EDA-8 scores were not related to parental reports of ASD severity. Conclusions Inspection of the test information function suggests that the EDA-8 may be a useful tool to identify children on the autism spectrum who show an extreme response to demands, as a starting point for more in-depth assessment.


2021 ◽  
pp. 001316442199121
Author(s):  
Guher Gorgun ◽  
Okan Bulut

In low-stakes assessments, some students may not reach the end of the test and leave some items unanswered due to various reasons (e.g., lack of test-taking motivation, poor time management, and test speededness). Not-reached items are often treated as incorrect or not-administered in the scoring process. However, when the proportion of not-reached items is high, these traditional approaches may yield biased scores and thereby threatening the validity of test results. In this study, we propose a polytomous scoring approach for handling not-reached items and compare its performance with those of the traditional scoring approaches. Real data from a low-stakes math assessment administered to second and third graders were used. The assessment consisted of 40 short-answer items focusing on addition and subtraction. The students were instructed to answer as many items as possible within 5 minutes. Using the traditional scoring approaches, students’ responses for not-reached items were treated as either not-administered or incorrect in the scoring process. With the proposed scoring approach, students’ nonmissing responses were scored polytomously based on how accurately and rapidly they responded to the items to reduce the impact of not-reached items on ability estimation. The traditional and polytomous scoring approaches were compared based on several evaluation criteria, such as model fit indices, test information function, and bias. The results indicated that the polytomous scoring approaches outperformed the traditional approaches. The complete case simulation corroborated our empirical findings that the scoring approach in which nonmissing items were scored polytomously and not-reached items were considered not-administered performed the best. Implications of the polytomous scoring approach for low-stakes assessments were discussed.


2020 ◽  
Author(s):  
John Harmon Wolfe

This paper explores the subject of generative adaptive testing using the digit span test as an example. A large-sample study of computer-generated and administered digit-span items on Navy recruits showed an almost perfect correlation (.98-.99) between digit span length and IRT difficulty. Predicted IRT parameters can be used for adaptive testing using items generated in real-time. Our results suggest that the best research strategy for developing generative adaptive tests may be to start with the most elementary cognitive tasks, and then build toward more complete psychometric models of complex mental tasks. The results of this study are sufficiently encouraging so that the same research approach should be tried with other forms of memory span tests and more complex working memory tests, including tests for figures, colors, and words. The paper advances the conjecture that the test information function of a generative CAT system has a mathematical relationship to the model fit and the distribution of the model-specified item parameters, independent of the content domain of the test.


2020 ◽  
Vol 78 (2) ◽  
pp. 196-214
Author(s):  
Friyatmi Friyatmi ◽  
Djemari Mardapi ◽  
Haryanto Haryanto

The development of the economics HOTS test that combines critical thinking skills, problem-solving, and critical thinking are essential to meet the challenges of the 21st century life skills. The combination of these thinking skills in the HOT test will help teachers to diagnose students' strengths and weaknesses. However, it could interfere with the accuracy of the measurement results if analyzed using IRT in a single analysis. The Multidimensional Item Response Theory (MIRT) resolved the accuracy of the measurement issues. This research aimed to develop the economics HOTS test to estimate the student’s abilities in higher order thinking skills using the MIRT. The samples were 750 high school students selected from fourteen high schools in West Sumatera, Indonesia. The data were collected using tests which calibrated through the simple-structure MIRT model using R studio. The test reliability was calculated based on coefficients Alpha and test information function. The results show MIRT offers accurate measurement in estimating multidimensional test parameters. The item had a moderate average multidimensional discriminant and difficulty, while the students had a moderate HOTS ability. Their ability to think creatively was lower than critical thinking and problem-solving abilities. The test was proven to be reliable with coefficients Alpha of 0.81, it yielded a high-test information function of the 4.0124, and a low measurement error of 0.4992. It is suitable to be tested on students who have moderate abilities in problem-solving and critical thinking, but with high creative thinking ability. Keywords: HOTS, critical thinking, creative thinking, problem solving, MIRT.


This study aimed at revealing the effect of the number of alternatives in the multiple choice tests on the information function of the item and the test according to the three-parameter model under the item response theory. To achieve the objective of the study, a multi-choice achievement test was constructed in the second part of mathematics subject for the 10th grade students in the public schools in the capital city of Amman. The final test consists of 38 paragraphs and three models are prepared, which differ only in the number of item alternatives. The sample of the study consisted of 1530 students. The results of the study showed statistically significant differences in reliability in favor of the five- alternative form, as well as the fouralternative form, the results also showed no statistically significant differences between the arithmetic means of the information function due to the variable number of item alternatives.


2018 ◽  
Vol 3 (1) ◽  
pp. 44
Author(s):  
Wahyu Widhiarso

Abstract. The unit of analysis or measurement is not always item level, but also group of items (testlet). This paper demonstrates the development of measurement using testlet that are rarely applied in Indonesia. The example used in this paper is the development of the measurement of visual ability, one of several test included in AJT COGTEST. In this test, the basis of grouping items into testlet is their similarity of figure being referenced. This test consists of fifteen figure and each figure consists of seven items. The data analysis technique used is the Rasch Model. The result of comparison shows the advantages testlet psychometric properties as compared to item as unit of analysis. The data generated from testlet tends to be unidimensional, not infected local dependencies, high discrimination and high model fit than the unit of analysis in the form of grains. The comparison function test information indicates that the use testlet enhance test information function. In general, the concept of testlet and applications through Winsteps program in the development of measurement tools in presented in this paper.Keywords: Measurement Unit; Rasch Model; Testlet Abstrak. Unit pengukuran tidak selalu berbentuk butir, akan tetapi juga dapat berbentuk kelompok butir (testlet). Tulisan ini mendemonstrasikan pengembangan alat ukur dengan menggunakan testlet yang jarang diterapkan di Indonesia. Contoh yang dipakai dalam tulisan ini adalah pengembangan pengukuran abilitas visual bagian dari AJT COGTEST. Dasar pengelompokan butir menjadi satu testlet adalah kesamaan gambar yang diacu karena beberapa butir mengacu pada gambar yang berbeda. Teknik analisis data yang dipakai adalah Model Rasch. Hasil perbandingan properti psikometris menunjukkan kelebihan testlet dibanding dengan butir. Data yang dihasilkan dari testlet cenderung bersifat unidimensi, tidak terjangkit dependensi lokal, memiliki ketepatan model dan daya diskriminasi butir yang lebih baik dibanding dengan unit analisis berupa butir. Hasil perbandingan fungsi informasi tes menunjukkan bahwa penggunaan testlet meningkatkan fungsi informasi tes. Secara umum konsep mengenai testlet dan aplikasinya melalui program Winsteps dalam pengembangan alat ukur dalam dipaparkan dalam tulisan ini.Kata Kunci : Testlet; Unit Pengukuran; Model Rasch


2018 ◽  
Vol 17 (1) ◽  
pp. 1
Author(s):  
Farida Agus Setiawati ◽  
Rita Eka Izzaty ◽  
Veny Hidayat

This study aims to analyze the characteristics of the Scholastic Aptitude Test (SAT), consisting of both verbal and numerical subtests. We used a descriptive quantitative approach by describing the characteristics of SAT based on the degree of item difficulty, item discrimination index, pseudoguessing index, test information function and standard error measurement. The data are responses of the SAT instrument, collected from 1,047 subjects in Yogyakarta using the documentation technique. Data were then analyzed by Item Response Theory (IRT) approach with the help of the BILOG program on all logistic parameter models, preceded by identifying item suitability with the model. Analysis concludes that: verbal subtest tends to compliment the 2-PL and 3-PL model, meanwhile, numerical subtest only fit the 2-PL model. Majority items of SAT have a good characteristic on index of item difficulty, item discrimination, and pseudoguessing, and based of test information function, SAT is accurate to be used in the 1-PL, 2-PL, and 3-PL IRT models for all level of ability.


Sign in / Sign up

Export Citation Format

Share Document