scholarly journals Post exam item analysis: Implication for intervention

2019 ◽  

AbstractPost exam item analysis enables teachers to reduce biases on student achievement assessments and improve their way of instruction. Difficulty indices, discrimination power and distracter efficiencies were commonly investigated in item analysis. This research was intended to investigate the difficulty and discrimination indices, distracters efficiency, whole test reliability and construct defects in summative test for freshman common course at Gondar CTE. In this study, 176 exam papers were analyzed in terms of difficulty index, point bi-serial correlation and distracter efficiencies. Internal consistency reliability and construct defects such as meaningless stems, punctuation errors and inconsistencies in option formats were also investigated. Results revealed that the summative test as a whole has moderate difficulty level (0.56 ± 0.20) and good distracter efficiency (85.71% ± 29%). However, the exam was poor in terms of discrimination power (0.16 ± 0.28) and internal consistency reliability (KR-20 = 0.58). Only one item has good discrimination power and one more item excellent in its discrimination. About 41.9% of the items were either too easy or too difficult. Inconsistency in option formats or inappropriate options, punctuation errors and meaningless stems were also observed. Thus, future test development interventions should give due emphasis on item reliability, discrimination coefficient and item construct defects.

Author(s):  
Anupama Jena ◽  
Mahesh Chander ◽  
Sushil K. Sinha

In the present study, a test was developed to measure the knowledge level of dairy farmers about scientific dairy farming. A preliminary set of 87 knowledge items was initially administered to 60 randomly selected dairy farmers for item analysis. The difficulty index and discrimination index was found out, and the items with difficulty index ranging from 30 to 80 and the discrimination index ranging from 0.30 to 0.55 were included in the final format of the knowledge test. A total of 48 items which fulfilled both the criteria were selected for the final format of knowledge test. Reliability of the test through split half method was found out to be 0.386 and the coefficient of correlation value by the test-retest method was 0.452, which was found to be significant at 1% level of significance. Hence, the knowledge test constructed was highly stable, reliable and validated for measuring what it intends to.


2007 ◽  
Vol 101 (2) ◽  
pp. 519-524 ◽  
Author(s):  
D. Gabrielle Jones-Wiley ◽  
Alberto F. Restori ◽  
Howard B. Lee

A measure on attitudes toward war was administered to 125 student participants at a California university to assess psychometric properties for this scale for possible use in current research. A 5-point scale was substituted for the 2-point one originally. Item analysis indicated 23 of 32 items were viable. Using Cronbach reliability coefficient α and factor analysis, the shortened measure had an internal consistency reliability of .85. Factor analysis yielded a 4-factor structure: (1) War is Bad, (2) War is Necessary, (3) Positive Aspects of War, and (4) No Justification. These results indicate this seemingly outdated measure of war attitudes remains useful for current research purposes involving measuring attitudes toward war. However, longitudinal research is necessary.


2020 ◽  
Vol 26 (3) ◽  
pp. 322-331
Author(s):  
Hyo-Suk Song ◽  
So-Hee Lim

Purpose: The purpose of this study was to investigate the validity and reliability of the Korean Version of the Grit (Grit-K) scale for nursing students in Korea.Methods: The participants in the study were 277 nursing students. Their grit was verified by using self-reports and the results of a questionnaire. Grit was translated into Korean and its content validity was verified by five experts. The validity of the instrument was verified through item analysis, exploratory factor analysis, and confirmatory factor analysis. Reliability verification was analyzed by using internal consistency reliability.Results: Two factors were identified through exploratory factor analysis and six items of the original instrument were found to be valid. In the confirmatory factor analysis, the validity of the instrument was verified as the model. The internal consistency reliability was also acceptable and Grit was found to be an applicable instrument.Conclusion: This study shows that the Korean Version of the Grit Questionnaire is a valid and reliable instrument to assess nursing students in Korea.


2019 ◽  
Author(s):  
Assad Ali Rezigalla ◽  
Elwathiq Khalid Ibrahim ◽  
Amar Babiker ElHussein

Abstract Background Distractor efficiency of multiple choice item responses is a component of item analysis used by the examiners to to evaluate the credibility and functionality of the distractors.Objective To evaluate the impact of functionality (efficiency) of the distractors on difficulty and discrimination indices.Methods A cross-sectional study in which standard item analysis of an 80-item test consisted of A type MCQs was performed. Correlation and significance of variance among Difficulty index (DIF), discrimination index (DI), and distractor Efficiency (DE) were measured.Results There is a significant moderate positive correlation between difficulty index and distractor efficiency, which means there is a tendency for high difficulty index go with high distractor efficiency (and vice versa). A weak positive correlation between distractor efficiency and discrimination index.Conclusions Non-functional distractor can reduce discrimination power of multiple choice questions. More training and effort for construction of plausible options of MCQ items is essential for the validity and reliability of the tests.


2021 ◽  
Vol 108 (Supplement_7) ◽  
Author(s):  
Michelle Smigielski ◽  
Scott Mackenzie ◽  
Owen Dent ◽  
Anna Giles

Abstract Aims Acute appendicitis is a common emergency surgical presentation accounting for approximately 30,000 emergency operations annually in Australia. There is need for a clinical grading tool that can quickly and reliably assess the severity of appendicitis at the time of surgery and predict the difficulty of the operation. Methods Over 12 months, 111 questionnaires relating to the difficulty of laparoscopic appendicectomy operation, anatomical and pathological features, time taken, and need for senior assistance were completed by surgeons and trainees of varying seniority. Construction of a scale of the difficulty of operation was by the method of summated ratings. The final scale was generated utilising further item analysis, principal components analysis, internal consistency reliability analysis and concurrent validity analysis. Results A scale of 8 anatomical and pathological features that predict the difficulty of laparoscopic appendicectomy was formed. These include the presence of acute adhesions, gross pathology of the appendix, quality of the appendix, mesoappendix, appendix base and retroperitoneum, visibility of the appendix artery, and adherence of the appendix to adjacent structures. The scale has high internal consistency reliability, and both individual items and the scale has been validated by comparison with the surgeons’ perceived difficulty, the length of the operation and the need to call for senior assistance. Conclusions This appendicectomy grading scale based entirely on readily identifiable laparoscopic findings is able to predict the difficulty of the operation and can be used to facilitate operative planning as well as to improve criteria for assessing operative competency in trainee surgeons.


2016 ◽  
Vol 24 (3) ◽  
pp. 388-398 ◽  
Author(s):  
Maaidah M. Algamdi ◽  
Sandra K. Hanneman

Purpose: The study aims were to (a) test reliability of the Arabic versions of the Cancer Behavior Inventory-Brief Arabic (CBI-BA) among patients diagnosed with any type of cancer and the Functional Assessment of Cancer Therapy-Breast (FACT-BA) in women with breast cancer and (b) assess participant understanding of CBI-BA items. Methods: A cross-sectional design was used to assess preliminary evidence for internal consistency reliability of the CBI-BA and the FACT-BA in a community-dwelling sample of Arabic-speaking persons diagnosed with cancer. Participants were randomly selected for cognitive interview. Results: Cronbach’s alphas were ≥.76 for the CBI-BA, .91 for the FACT-BA, and .43–.89 for the FACT-BA subscales. Cognitive interviews revealed several CBI-BA items required revision. Conclusion: The total CBI-BA and the FACT-BA scales have adequate internal consistency reliability estimates.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Deborah L. Davis ◽  
Debra K. Creedy ◽  
Zoe Bradfield ◽  
Elizabeth Newnham ◽  
Marjorie Atchan ◽  
...  

Abstract Background Woman-centred care is recognised as a fundamental construct of midwifery practice yet to date, there has been no validated tool available to measure it. This study aims to develop and test a self-report tool to measure woman-centred care in midwives. Methods A staged approach was used for tool development including deductive methods to generate items, testing content validity with a group of experts, and psychometrically testing the instrument with a sample drawn from the target audience. The draft 58 item tool was distributed in an online survey using professional networks in Australia and New Zealand. Testing included item analysis, principal components analysis with direct oblimin rotation and subscale analysis, and internal consistency reliability. Results In total, 319 surveys were returned. Analysis revealed five factors explaining 47.6% of variance. Items were reduced to 40. Internal consistency (.92) was high but varied across factors. Factors reflected the extent to which a midwife meets the woman’s unique needs; balances the woman’s needs within the context of the maternity service; ensures midwifery philosophy underpins practice; uses evidence to inform collaborative practice; and works in partnership with the woman. Conclusion The Woman-Centred Care Scale-Midwife Self Report is the first step in developing a valid and reliable tool to enable midwives to self-assess their woman-centredness. Further research in alternate populations and refinement is warranted.


2020 ◽  
Vol 19 (1) ◽  
Author(s):  
Surajit Kundu ◽  
Jaideo M Ughade ◽  
Anil R Sherke ◽  
Yogita Kanwar ◽  
Samta Tiwari ◽  
...  

Background: Multiple-choice questions (MCQs) are the most frequently accepted tool for the evaluation of comprehension, knowledge, and application among medical students. In single best response MCQs (items), a high order of cognition of students can be assessed. It is essential to develop valid and reliable MCQs, as flawed items will interfere with the unbiased assessment. The present paper gives an attempt to discuss the art of framing well-structured items taking kind help from the provided references. This article puts forth a practice for committed medical educators to uplift the skill of forming quality MCQs by enhanced Faculty Development programs (FDPs). Objectives: The objective of the study is also to test the quality of MCQs by item analysis. Methods: In this study, 100 MCQs of set I or set II were distributed to 200 MBBS students of Late Shri Lakhiram Agrawal Memorial Govt. Medical College Raigarh (CG) for item analysis for quality MCQs. Set I and Set II were MCQs which were formed by 60 medical faculty before and after FDP, respectively. All MCQs had a single stem with three wrong and one correct answers. The data were entered in Microsoft excel 2016 software to analyze. The difficulty index (Dif I), discrimination index (DI), and distractor efficiency (DE) were the item analysis parameters used to evaluate the impact on adhering to the guidelines for framing MCQs. Results: The mean calculated difficulty index, discrimination index, and distractor efficiency were 56.54%, 0.26, and 89.93%, respectively. Among 100 items, 14 items were of higher difficulty level (DIF I < 30%), 70 were of moderate category, and 16 items were of easy level (DIF I > 60%). A total of 10 items had very good DI (0.40), 32 had recommended values (0.30 - 0.39), and 25 were acceptable with changes (0.20 - 0.29). Of the 100 MCQs, there were 27 MCQs with DE of 66.66% and 11 MCQs with DE of 33.33%. Conclusions: In this study, higher cognitive-domain MCQs increased after training, recurrent-type MCQ decreased, and MCQ with item writing flaws reduced, therefore making our results much more statistically significant. We had nine MCQs that satisfied all the criteria of item analysis.


Sign in / Sign up

Export Citation Format

Share Document