scholarly journals Bayesian analysis of hierarchical IRT models: comparing and combining the unidimensional and multi-unidimensional IRT models

2005 ◽  
Author(s):  
◽  
Yanyan Sheng

As item response theory models gain increased popularity in large scale educational and measurement testing situations, many studies have been conducted on the development and applications of unidimensional and multidimensional models. However, to date, no study has yet looked at models in the IRT framework with an overall ability dimension underlying all test items and several ability dimensions specific for each subtest. This study is to propose such a model and compare it with the conventional IRT models using Bayesian methodology. The results suggest that the proposed model offers a better way to represent the test situations not realized in existing models. The model specifications for the proposed model also give rise to implications for test developers on test designing. In addition, the proposed IRT model can be applied in other areas, such as intelligence or psychology, among others.

2018 ◽  
Vol 29 (1) ◽  
pp. 35-44
Author(s):  
Nell Sedransk

This article is about FMCSA data and its analysis. The article responds to the two-part question: How does an Item Response Theory (IRT) model work differently . . . or better than any other model? The response to the first part is a careful, completely non-technical exposition of the fundamentals for IRT models. It differentiates IRT models from other models by providing the rationale underlying IRT modeling and by using graphs to illustrate two key properties for data items. The response to the second part of the question about superiority of an IRT model is, “it depends.” For FMCSA data, serious challenges arise from complexity of the data and from heterogeneity of the carrier industry. Questions are posed that will need to be addressed to determine the success of the actual model developed and of the scoring system.


2021 ◽  
pp. 43-48
Author(s):  
Rosa Fabbricatore ◽  
Francesco Palumbo

Evaluating learners' competencies is a crucial concern in education, and home and classroom structured tests represent an effective assessment tool. Structured tests consist of sets of items that can refer to several abilities or more than one topic. Several statistical approaches allow evaluating students considering the items in a multidimensional way, accounting for their structure. According to the evaluation's ending aim, the assessment process assigns a final grade to each student or clusters students in homogeneous groups according to their level of mastery and ability. The latter represents a helpful tool for developing tailored recommendations and remediations for each group. At this aim, latent class models represent a reference. In the item response theory (IRT) paradigm, the multidimensional latent class IRT models, releasing both the traditional constraints of unidimensionality and continuous nature of the latent trait, allow to detect sub-populations of homogeneous students according to their proficiency level also accounting for the multidimensional nature of their ability. Moreover, the semi-parametric formulation leads to several advantages in practice: It avoids normality assumptions that may not hold and reduces the computation demanding. This study compares the results of the multidimensional latent class IRT models with those obtained by a two-step procedure, which consists of firstly modeling a multidimensional IRT model to estimate students' ability and then applying a clustering algorithm to classify students accordingly. Regarding the latter, parametric and non-parametric approaches were considered. Data refer to the admission test for the degree course in psychology exploited in 2014 at the University of Naples Federico II. Students involved were N=944, and their ability dimensions were defined according to the domains assessed by the entrance exam, namely Humanities, Reading and Comprehension, Mathematics, Science, and English. In particular, a multidimensional two-parameter logistic IRT model for dichotomously-scored items was considered for students' ability estimation.


2006 ◽  
Vol 31 (1) ◽  
pp. 63-79 ◽  
Author(s):  
Henry May

A new method is presented and implemented for deriving a scale of socioeconomic status (SES) from international survey data using a multilevel Bayesian item response theory (IRT) model. The proposed model incorporates both international anchor items and nation-specific items and is able to (a) produce student family SES scores that are internationally comparable, (b) reduce the influence of irrelevant national differences in culture on the SES scores, and (c) effectively and efficiently deal with the problem of missing data in a manner similar to Rubin’s (1987) multiple imputation approach. The results suggest that this model is superior to conventional models in terms of its fit to the data and its ability to use information collected via international surveys.


2019 ◽  
Vol 45 (3) ◽  
pp. 339-368 ◽  
Author(s):  
Chun Wang ◽  
Steven W. Nydick

Recent work on measuring growth with categorical outcome variables has combined the item response theory (IRT) measurement model with the latent growth curve model and extended the assessment of growth to multidimensional IRT models and higher order IRT models. However, there is a lack of synthetic studies that clearly evaluate the strength and limitations of different multilevel IRT models for measuring growth. This study aims to introduce the various longitudinal IRT models, including the longitudinal unidimensional IRT model, longitudinal multidimensional IRT model, and longitudinal higher order IRT model, which cover a broad range of applications in education and social science. Following a comparison of the parameterizations, identification constraints, strengths, and weaknesses of the different models, a real data example is provided to illustrate the application of different longitudinal IRT models to model students’ growth trajectories on multiple latent abilities.


2020 ◽  
Vol 80 (5) ◽  
pp. 975-994
Author(s):  
Yoonsun Jang ◽  
Allan S. Cohen

A nonconverged Markov chain can potentially lead to invalid inferences about model parameters. The purpose of this study was to assess the effect of a nonconverged Markov chain on the estimation of parameters for mixture item response theory models using a Markov chain Monte Carlo algorithm. A simulation study was conducted to investigate the accuracy of model parameters estimated with different degree of convergence. Results indicated the accuracy of the estimated model parameters for the mixture item response theory models decreased as the number of iterations of the Markov chain decreased. In particular, increasing the number of burn-in iterations resulted in more accurate estimation of mixture IRT model parameters. In addition, the different methods for monitoring convergence of a Markov chain resulted in different degrees of convergence despite almost identical accuracy of estimation.


2019 ◽  
Vol 80 (3) ◽  
pp. 461-475
Author(s):  
Lianne Ippel ◽  
David Magis

In dichotomous item response theory (IRT) framework, the asymptotic standard error (ASE) is the most common statistic to evaluate the precision of various ability estimators. Easy-to-use ASE formulas are readily available; however, the accuracy of some of these formulas was recently questioned and new ASE formulas were derived from a general asymptotic theory framework. Furthermore, exact standard errors were suggested to better evaluate the precision of ability estimators, especially with short tests for which the asymptotic framework is invalid. Unfortunately, the accuracy of exact standard errors was assessed so far only in a very limiting setting. The purpose of this article is to perform a global comparison of exact versus (classical and new formulations of) asymptotic standard errors, for a wide range of usual IRT ability estimators, IRT models, and with short tests. Results indicate that exact standard errors globally outperform the ASE versions in terms of reduced bias and root mean square error, while the new ASE formulas are also globally less biased than their classical counterparts. Further discussion about the usefulness and practical computation of exact standard errors are outlined.


2018 ◽  
Vol 79 (3) ◽  
pp. 462-494 ◽  
Author(s):  
Ken A. Fujimoto

Advancements in item response theory (IRT) have led to models for dual dependence, which control for cluster and method effects during a psychometric analysis. Currently, however, this class of models does not include one that controls for when the method effects stem from two method sources in which one source functions differently across the aspects of another source (i.e., a nested method–source interaction). For this study, then, a Bayesian IRT model is proposed, one that accounts for such interaction among method sources while controlling for the clustering of individuals within the sample. The proposed model accomplishes these tasks by specifying a multilevel trifactor structure for the latent trait space. Details of simulations are also reported. These simulations demonstrate that this model can identify when item response data represent a multilevel trifactor structure, and it does so in data from samples as small as 250 cases nested within 50 clusters. Additionally, the simulations show that misleading estimates for the item discriminations could arise when the trifactor structure reflected in the data is not correctly accounted for. The utility of the model is also illustrated through the analysis of empirical data.


Author(s):  
Dani Gamerman ◽  
Tufi M. Soares ◽  
Flávio Gonçalves

This article discusses the use of a Bayesian model that incorporates differential item functioning (DIF) in analysing whether cultural differences may affect the performance of students from different countries in the various test items which make up the OECD’s Programme for International Student Assessment (PISA) test of mathematics ability. The PISA tests in mathematics and other subjects are used to compare the educational attainment of fifteen-year old students in different countries. The article first provides a background on PISA, DIF and item response theory (IRT) before describing a hierarchical three-parameter logistic model for the probability of a correct response on an individual item to determine the extent of DIF remaining in the mathematics test of 2003. The results of Bayesian analysis illustrate the importance of appropriately accounting for all sources of heterogeneity present in educational testing and highlight the advantages of the Bayesian paradigm when applied to large-scale educational assessment.


2020 ◽  
Vol 44 (7-8) ◽  
pp. 563-565
Author(s):  
Hwanggyu Lim ◽  
Craig S. Wells

The R package irtplay provides practical tools for unidimensional item response theory (IRT) models that conveniently enable users to conduct many analyses related to IRT. For example, the irtplay includes functions for calibrating online items, scoring test-takers’ proficiencies, evaluating IRT model-data fit, and importing item and/or proficiency parameter estimates from the output of popular IRT software. In addition, the irtplay package supports mixed-item formats consisting of dichotomous and polytomous items.


Sign in / Sign up

Export Citation Format

Share Document