Screening Test Items for Differential Item Functioning

2014 ◽  
Vol 39 (1) ◽  
pp. 3-21 ◽  
Author(s):  
Nicholas T. Longford
Author(s):  
Abdul Wahab Ibrahim

The study used statistical procedures based on Item Response Theory to detect Differential Item Functioning (DIF) in polytomous tests. These were with a view to improving the quality of test items construction. The sample consisted of an intact class of 513 Part 3 undergraduate students who registered for the course EDU 304: Tests and Measurement at Sule Lamido University during 2017/2018 Second Semester. A self-developed polytomous research instrument was used to collect data. Data collected were analysed using Generalized Mantel Haenszel, Simultaneous Item Bias Test, and Logistic Discriminant Function Analysis. The results showed that there was no significant relationship between the proportions of test items that function differentially in the polytomous test when the different statistical methods are used.  Further, the three parametric and non-parametric methods complement each other in their ability to detect DIF in the polytomous test format as all of them have capacity to detect DIF but perform differently. The study concluded that there was a high degree of correspondence between the three procedures in their ability to detect DIF in polytomous tests. It was recommended that test experts and developers should consider using procedure based on Item Response Theory in DIF detection.


2021 ◽  
Author(s):  
John Marc Goodrich ◽  
Natalie Koziol ◽  
HyeonJin Yoon

When measuring academic skills among students whose primary language is not English, standardized assessments are often provided in languages other than English (Tabaku, Carbuccia-Abbott, & Saavedra, 2018). The degree to which alternate-language test items function equivalently must be evaluated, but traditional methods of investigating measurement equivalence may be confounded by group differences on characteristics other than ability level and language form. The primary purposes of this study were to investigate differential item functioning (DIF) and item bias across Spanish and English forms of an assessment of early mathematics skills. Secondary purposes were to investigate the presence of selection bias and demonstrate a novel approach for investigating DIF that uses a regression discontinuity design framework to control for selection bias. Data were drawn from 1,750 Spanish-speaking Kindergarteners participating in the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99, who were administered either the Spanish or English version of the mathematics assessment based on their performance on an English language screening measure. Results indicated a minority of items functioned differently across the Spanish and English forms, and subsequent item content scrutiny indicated no plausible evidence of item bias. Evidence of selection bias—differences between groups in SES, age, and country of birth, in addition to mathematics ability and form language—highlighted limitations of a traditional approach for investigating DIF that only controlled for ability. Fewer items exhibited DIF when controlling for selection bias (11% vs. 25%), and the type and direction of DIF differed upon controlling for selection bias.


2018 ◽  
Vol 12 (4) ◽  
pp. 5
Author(s):  
Andreas Alm Fjellborg ◽  
Lena Molin

Elever med utländsk bakgrund tenderar att prestera sämre än svenskfödda elever i skolan primärt på grund av sämre kunskaper i det svenska språket. Utifrån statistisk analys (Differential item functioning) identifieras uppgifter från de nationella proven i geografi (2014 – 2017) där elever som följer kursplanen i svenska som andraspråk klarar sig avsevärt mycket bättre - eller sämre - än förväntat. Tidigare forskning har visat att geografiska begrepp är särskilt svåra för elever som inte har svenska som modersmål, vilket också  påvisas i denna studie. Den visar att det särskilt är uppgifter med lite text som handlar om geografiska begrepp som uppvisar större skillnader i prestationer mellan elever som följer kursplanen i svenska respektive svenska som andraspråk. Resultaten kan stödja såväl lärare som provkonstruktörer att bättre anpassa undervisning och prov genom att undvika att skapa uppgifter som mäter irrelevanta bakgrundsfaktorer som påverkar elevernas möjligheter att besvara uppgifter på ett adekvat vis utifrån deras kunskapsnivåer.Nyckelord: Nationella prov i geografi, uppgiftsformat, elever med utländsk bakgrund, svenskfödda elever, DIF-analysWhat types of test items benefit students who follow the syllabus in Swedish as a second language? A study using data from the Swedish national assessments in geography.AbstractPupils born outside Sweden are likely to accomplish less in comparison to native pupils, primarily as a result of inferior knowledge of the Swedish language. Based on a statistical analysis (Differential item functioning) of questions given at national tests in geography (2014-2017), it was possible to identify questions where pupils following the syllabus of Swedish as a second language attain either considerably better or more inferior results than expected. Earlier research has shown that pupils whose native language is not Swedish find it particularly hard to comprehend geographic concepts, which was confirmed by the present study. This study furthermore revealed that in particular questions containing a limited amount of text concerning geographic concepts resulted in larger differences than expected between native pupils following the syllabus in Swedish and foreign born pupils following the syllabus in Swedish as a second language. These findings could aid teachers and test constructors in their efforts to adjust teaching and tests by not formulating questions that measure irrelevant background factors, which might affect the pupils’ ability to answer questions adequately, based on their level of knowledge.Keywords: National tests in geography, question format, pupils born outside Sweden, Swedish-born pupils, DIF-analysis


2005 ◽  
Vol 74 ◽  
pp. 135-145
Author(s):  
Tamara van Schilt-Mol ◽  
Ton Vallen ◽  
Henny Uiterwijk

Previous research has shown that the Dutch 'Final Test of Primary Education' contains a number of unintentionally and therefore unwanted, difficult test items, leading to Differential Item Functioning (DIF) for immigrant minority students whose parents' dominant language is Turkish or Arab/Berber. Two statistical procedures were used to identify DIF-items in the Final Test of 1997. Subsequently, five experiments were conducted to detect causes of DIF, revealing a number of hypotheses concerning possible linguistic, cultural, and textual sources. These hypotheses were used to manipulate original DIF-items into intentionally DIF-free items. The article discusses three possible sources of DIF: (1) the use of fixed (misleading) answer-options and (2) of misleading illustrations (both in the disadvantage of the minority students), and (3) the fact that questions concerning past tense often lead to DIF (in their advantage).


1996 ◽  
Vol 21 (3) ◽  
pp. 187-201 ◽  
Author(s):  
Rebecca Zwick ◽  
Dorothy T. Thayer

Several recent studies have investigated the application of statistical inference procedures to the analysis of differential item functioning (DIF) in polytomous test items that are scored on an ordinal scale. Mantel’s extension of the Mantel-Haenszel test is one of several hypothesis-testing methods for this purpose. The development of descriptive statistics for characterizing DIF in polytomous test items has received less attention. As a step in this direction, two possible standard error formulas for the polytomous DIF index proposed by Dorans and Schmitt were derived. These standard errors, as well as associated hypothesis-testing procedures, were evaluated though application to simulated data. The standard error that performed better is based on Mantel’s hypergeometric model. The alternative standard error, based on a multinomial model, tended to yield values that were too small.


2021 ◽  
Author(s):  
Ben Stenhaug ◽  
Michael C. Frank ◽  
Benjamin Domingue

Differential item functioning (DIF) is a popular technique within the item-response theory framework for detecting test items that are biased against particular demographic groups. The last thirty years have brought significant methodological advances in detecting DIF. Still, typical methods—such as matching on sum scores or identifying anchor items—are based exclusively on internal criteria and therefore rely on a crucial piece of circular logic: items with DIF are identified via an assumption that other items do not have DIF. This logic is an attempt to solve an easy-to-overlook identification problem at the beginning of most DIF detection. We explore this problem, which we describe as the Fundamental DIF Identification Problem, in depth here. We suggest three steps for determining whether it is surmountable and DIF detection results can be trusted. (1) Examine raw item response data for potential DIF. To this end, we introduce a new graphical method for visualizing potential DIF in raw item response data. (2) Compare the results of a variety of methods. These methods, which we describe in detail, include commonly-used anchor item methods, recently-proposed anchor point methods, and our suggested adaptations. (3) Interpret results in light of the possibility of DIF methods failing. We illustrate the basic challenge and the methodological options using the classic verbal aggression data and a simulation study. We recommend best practices for cautious DIF detection.


2021 ◽  
Vol 20 (1) ◽  
pp. 55-62
Author(s):  
Anthony Pius Effiom

This study used Item Response Theory approach to assess Differential Item Functioning (DIF) and detect item bias in Mathematics Achievement Test (MAT). The MAT was administered to 1,751 SS2 students in public secondary schools in Cross River State. Instrumentation research design was used to develop and validate a 50-item instrument. Data were analysed using the maximum likelihood estimation technique of BILOG-MG V3 software. The result of the study revealed that 6% of the total items exhibited differential item functioning between the male and female students. Based on the analysis, the study observed that there was sex bias on some of the test items in the MAT. DIF analysis attempt at eliminating irrelevant factors and sources of bias from any kind for a test to yield valid results is among the best methods of recent. As such, test developers and policymakers are recommended to take into serious consideration and exercise care in fair test practice by dedicating effort to more unbiased test development and decision making. Examination bodies should adopt the Item Response Theory in educational testing and test developers should therefore be mindful of the test items that can cause bias in response pattern between male and female students or any sub-group of consideration. Keywords: Assessment, Differential Item Functioning, Validity, Reliability, Test Fairness, Item Bias, Item Response Theory.


2002 ◽  
Vol 11 (3) ◽  
pp. 274-284 ◽  
Author(s):  
Carol Scheffner Hammer ◽  
Maria Pennock-Roman ◽  
Sarah Rzasa ◽  
J. Bruce Tomblin

The purpose of this research was to examine the Test of Language Development-P:2 (TOLD-P:2; Newcomer & Hammill, 1991) for item bias. The TOLD-P:2 was administered to 235 African American and 1,481 White kindergarten children living in the Midwest. Test items were examined for evidence of differential item functioning (DIF) using inferential and descriptive methods. Sixteen percent of all items of the TOLD-P:2 were found to have DIF. Of these items, 75% were found to be harder for the African American group. The percentages of items on the five core subtests identified as having DIF were as follows: Picture Vocabulary, 17%; Oral Vocabulary, 17%; Grammatic Understanding, 12%; Sentence Imitation, 20%; and Grammatic Completion, 13%. The implications of these findings are discussed in relation to the TOLD-P:3.


Sign in / Sign up

Export Citation Format

Share Document