scholarly journals The Importance of Sample Weights and Plausible Values in Large-Scale Assessments

Author(s):  
Serkan ARIKAN ◽  
Ferah ÖZER ◽  
Vuslat ŞEKER ◽  
Güneş ERTAŞ
Diagnostica ◽  
2017 ◽  
Vol 63 (3) ◽  
pp. 193-205 ◽  
Author(s):  
Oliver Lüdtke ◽  
Alexander Robitzsch

Zusammenfassung. In der psychologischen Forschung durchgeführte Messungen zur Erfassung von Konstrukten sind meistens mit einem Messfehler behaftet. Diese Messfehler führen zu verzerrten Schätzern von Populationsparametern und deren Standardfehlern. In den letzten Jahrzehnten hat sich im Bereich der Large-Scale-Assessments mit der Plausible-Values-Technik ein Verfahren zur Korrektur von messfehlerbehafteten Zusammenhängen zwischen latenten Variablen und beobachteten Kovariaten etabliert. Der vorliegende Beitrag führt anhand eines einfachen Beispiels aus der Klassischen Testtheorie in dieses komplexe statistische Verfahren ein. Es wird gezeigt, dass alternative Verfahren zur Schätzung von Personenwerten im Allgemeinen zu verzerrten Schätzungen von Zusammenhängen auf Populationsebene führen. In einer Simulationsstudie werden diese Befunde auf ein IRT-Modell für dichotome Indikatoren übertragen. Aus diagnostischer Sicht wird betont, dass Plausible Values nicht zur Schätzung von individuellen Fähigkeitsausprägungen verwendet werden sollen. Abschließend werden methodische Herausforderungen bei der Anwendung der Plausible-Values-Technik sowie das Potential für die psychologische Forschung diskutiert.


Mathematics ◽  
2021 ◽  
Vol 9 (13) ◽  
pp. 1579
Author(s):  
Juan Aparicio ◽  
Jose M. Cordero ◽  
Lidia Ortiz

International large-scale assessments (ILSAs) provide several measures as a representation of educational outcomes, the so-called plausible values, which are frequently interpreted as a representation of the ability range of students. In this paper, we focus on how this information should be incorporated into the estimation of efficiency measures of student or school performance using data envelopment analysis (DEA). Thus far, previous studies that have adopted this approach using data from ILSAs have used only one of the available plausible values or an average of all of them. We propose an approach based on the fuzzy DEA, which allows us to consider the whole distribution of results as a proxy of student abilities. To assess the extent to which our proposal offers similar results to those obtained in previous studies, we provide an empirical example using PISA data from 2015. Our results suggest that the performance measures estimated using the fuzzy DEA approach are strongly correlated with measures calculated using just one plausible value or an average measure. Therefore, we conclude that the studies that decide upon using one of these options do not seem to be making a significant error in their estimates.


Author(s):  
Clemens M. Lechner ◽  
Nivedita Bhaktha ◽  
Katharina Groskurth ◽  
Matthias Bluemke

AbstractMeasures of cognitive or socio-emotional skills from large-scale assessments surveys (LSAS) are often based on advanced statistical models and scoring techniques unfamiliar to applied researchers. Consequently, applied researchers working with data from LSAS may be uncertain about the assumptions and computational details of these statistical models and scoring techniques and about how to best incorporate the resulting skill measures in secondary analyses. The present paper is intended as a primer for applied researchers. After a brief introduction to the key properties of skill assessments, we give an overview over the three principal methods with which secondary analysts can incorporate skill measures from LSAS in their analyses: (1) as test scores (i.e., point estimates of individual ability), (2) through structural equation modeling (SEM), and (3) in the form of plausible values (PVs). We discuss the advantages and disadvantages of each method based on three criteria: fallibility (i.e., control for measurement error and unbiasedness), usability (i.e., ease of use in secondary analyses), and immutability (i.e., consistency of test scores, PVs, or measurement model parameters across different analyses and analysts). We show that although none of the methods are optimal under all criteria, methods that result in a single point estimate of each respondent’s ability (i.e., all types of “test scores”) are rarely optimal for research purposes. Instead, approaches that avoid or correct for measurement error—especially PV methodology—stand out as the method of choice. We conclude with practical recommendations for secondary analysts and data-producing organizations.


Sign in / Sign up

Export Citation Format

Share Document