The Effect of Two Scoring Methods on Multiple Choice Agricultural Science Test Scores

We evaluated the use of the nominal response model (NRM) to score multiple-choice (also known as “select the best option”) situational judgment tests (SJTs). Using data from two large studies, we compared the reliability and correlations of NRM scores with those from various classical and item response theory (IRT) scoring methods. The SJTs measured emotional management (Study 1) and teamwork and collaboration (Study 2). In Study 1 the NRM scoring method was shown to be superior in reliability and in yielding higher correlations with external measures to three classical test theory–based and four other IRT-based methods. In Study 2, only slight differences between scoring methods were observed. An explanation for the discrepancy in findings is that in cases where item keys are ambiguous (as in Study 1), the NRM accommodates that ambiguity, but in cases where item keys are clear (as in Study 2), different methods provide interchangeable scores. We characterize ambiguous and clear keys using category response curves based on parameter estimates of the NRM and discuss the relationships between our findings and those from the wisdom-of-the-crowd literature.

Download Full-text

Recommendation for English multiple-choice cloze questions based on expected test scores

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-2010-0209 ◽

2011 ◽

Vol 15 (1) ◽

pp. 15-24

Author(s):

Tomoharu Iwata ◽

Tomoko Kojiri ◽

Takeshi Yamada ◽

Toyohide Watanabe

Keyword(s):

Test Scores ◽

Multiple Choice

Download Full-text

The effect of repetition on multiple choice test scores.

Large scale Rorschach techniques: A manual for the group Rorschach and multiple choice tests (2nd ed., 2nd printing). ◽

10.1037/13988-019 ◽

2012 ◽

pp. 156-160

Author(s):

M. R. Harrower ◽

M. E. Steiner

Keyword(s):

Test Scores ◽

Multiple Choice ◽

Choice Test ◽

Multiple Choice Test

Download Full-text

Examination of the Quality of Multiple-choice Items on Classroom Tests

The Canadian Journal for the Scholarship of Teaching and Learning ◽

10.5206/cjsotl-rcacea.2011.2.4 ◽

2011 ◽

Vol 2 (2) ◽

Cited By ~ 26

Author(s):

David DiBattista ◽

Laura Kurzawa

Keyword(s):

Nous Avons ◽

Test Scores ◽

Item Analysis ◽

Multiple Choice ◽

Discriminatory Power ◽

Multiple Choice Tests ◽

Choice Tests ◽

Discrimination Coefficient ◽

Multiple Choice Items

Because multiple-choice testing is so widespread in higher education, we assessed the quality of items used on classroom tests by carrying out a statistical item analysis. We examined undergraduates’ responses to 1198 multiple-choice items on sixteen classroom tests in various disciplines. The mean item discrimination coefficient was +0.25, with more than 30% of items having unsatisfactory coefficients less than +0.20. Of the 3819 distractors, 45% were flawed either because less than 5% of examinees selected them or because their selection was positively rather than negatively correlated with test scores. In three tests, more than 40% of the items had an unsatisfactory discrimination coefficient, and in six tests, more than half of the distractors were flawed. Discriminatory power suffered dramatically when the selection of one or more distractors was positively correlated with test scores, but it was only minimally affected by the presence of distractors that were selected by less than 5% of examinees. Our findings indicate that there is considerable room for improvement in the quality of many multiple-choice tests. We suggest that instructors consider improving the quality of their multiple-choice tests by conducting an item analysis and by modifying distractors that impair the discriminatory power of items. Étant donné que les examens à choix multiple sont tellement généralisés dans l’enseignement supérieur, nous avons effectué une analyse statistique des items utilisés dans les examens en classe afin d’en évaluer la qualité. Nous avons analysé les réponses des étudiants de premier cycle à 1198 questions à choix multiples dans 16 examens effectués en classe dans diverses disciplines. Le coefficient moyen de discrimination de l’item était +0.25. Plus de 30 % des items avaient des coefficients insatisfaisants inférieurs à + 0.20. Sur les 3819 distracteurs, 45 % étaient imparfaits parce que moins de 5 % des étudiants les ont choisis ou à cause d’une corrélation négative plutôt que positive avec les résultats des examens. Dans trois examens, le coefficient de discrimination de plus de 40 % des items était insatisfaisant et dans six examens, plus de la moitié des distracteurs était imparfaits. Le pouvoir de discrimination était considérablement affecté en cas de corrélation positive entre un distracteur ou plus et les résultatsde l’examen, mais la présence de distracteurs choisis par moins de 5 % des étudiants avait une influence minime sur ce pouvoir. Nos résultats indiquent que les examens à choix multiple peuvent être considérablement améliorés. Nous suggérons que les enseignants procèdent à une analyse des items et modifient les distracteurs qui compromettent le pouvoir de discrimination des items.

Download Full-text

Classroom Learning as a Function of Method of Presenting Instructional Materials

Psychological Reports ◽

10.2466/pr0.1966.19.3.971 ◽

1966 ◽

Vol 19 (3) ◽

pp. 971-977 ◽

Cited By ~ 7

Author(s):

E. Vaughn Gulo ◽

M. R. Nigro

Keyword(s):

Test Scores ◽

Multiple Choice ◽

Instructional Materials ◽

Choice Test ◽

Television Instruction ◽

Multiple Choice Test ◽

Classroom Learning

In two experiments the efficiencies of programmed, television, and conventional textbook instruction were compared. Ss were randomly assigned to a group which worked through a standard programmed text; one which read the same material in conventional textbook form; one which listened to and saw a verbatim video-taped lecture of the programmed material. A 30-item multiple-choice test was administered immediately following instruction (Exps. I and II; Ns = 160, 134) or 1 wk. later (Exp. II). The results indicate that Ss who simply read the material in conventional textbook format only tended to have higher criterion test scores than Ss in either the programmed or television instruction groups. The results were, therefore, interpreted as consistent with the contention often made that differences in effectiveness of various methods of instruction are negligible, or at best, slight.

Download Full-text

Empirical Estimates of Intercorrelations among the Components of Scores on Multiple-Choice Tests

Psychological Reports ◽

10.2466/pr0.1966.19.2.651 ◽

1966 ◽

Vol 19 (2) ◽

pp. 651-654 ◽

Cited By ~ 1

Author(s):

Donald W. Zimmerman ◽

Richard H. Williams ◽

Hubert H. Rehm ◽

William Elmore

Keyword(s):

College Students ◽

Computer Program ◽

Test Scores ◽

Multiple Choice ◽

Error Component ◽

Independent Components ◽

Multiple Choice Tests ◽

Error Components ◽

Empirical Estimates ◽

Choice Tests

College students were instructed to indicate on various multiple-choice tests whether they “knew the answer” or “guessed” each item, and the results were treated as estimated true and error components of scores. The values of the intercorrelations of these components were similar to those given by a computer program described previously. The values found for all tests were consistent with the assumption that test scores consist of both independent and non-independent components of error and that the non-independent error component is relatively large.

Download Full-text

Variability of Deviation IQ's Based on Multiple Choice Test Scores

Educational and Psychological Measurement ◽

10.1177/0013164485454005 ◽

1985 ◽

Vol 45 (4) ◽

pp. 745-751 ◽

Cited By ~ 1

Author(s):

Donald W. Zimmerman

Keyword(s):

Test Scores ◽

Multiple Choice ◽

Choice Test ◽

Multiple Choice Test

Download Full-text