Using Standards and Empirical Evidence to Develop Academic English Proficiency Test Items in Reading

Author(s):  
Alison L. Bailey ◽  
Robin Stevens ◽  
Frances A. Butler ◽  
Becky Huang ◽  
Judy N. Miyoshi
2019 ◽  
Vol 0 (0) ◽  
Author(s):  
Betül Hazal Dinçer ◽  
Elena Antonova-Unlu ◽  
Alper Kumcu

AbstractThe use of translation for language teaching and assessment, by and large, has been abandoned with the adoption of audio-lingual and communicative approaches in language teaching. As a result, nowadays translation items are not commonly used for measuring language proficiency in international language proficiency tests (e. g. TOEFL, IELTS).However, there are several countries that still use translation items in their national language proficiency tests (e. g. Turkey, Japan, China, Romania among others). The present study aims to examine whether or not multiple-choice translation items are an appropriate tool for measuring proficiency in English. To this end, the perceived level of difficulty and validity of multiple-choice translation items in the National English Proficiency Test (YDS) in Turkey were examined. The findings revealed that the participants did significantly better on the translation items than on the rest of the test items. They also perceived the translation items as the easiest among all the rest items in YDS. Moreover, while YDS as a whole indicated a strong validity based on correlation with TOEFL PBT Reading Sample Test, the translation items indicated moderate validity. Importantly, there was a significant difference between the two correlations. These findings suggest that multiple-choice translation items are likely to lower the overall validity of YDS tests, inflate the scores of test-takers and, thus, might be considered as problematic for the quality of the tests.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Endrati Jati Siwi ◽  
Rosyita Anindyarini ◽  
Sabiqun Nahar

Yureka Education Center (YEC) is one of the institutions which has developed an online-based English proficiency test. The test is called  English Proficiency Online Test (EPOT) which follows the TOEFL ITP (Institutional Testing Program) framework. So, this study aimed to analyze the characteristics of EPOT instruments consisting of Listening, Structure, and Reading subtests, which later the quality of each EPOT test item is identified. This study used a descriptive quantitative approach by describing the characteristics of EPOT test items in terms of item difficulty index, item discrimination index, test information’s function and test measurement’s errors. The data were collected through EPOT trials conducted by 2652 online test takers as partisipant from 20 provinces in Indonesia. The collected data were then analyzed using the Item Response Theory (IRT) approach using BILOG program on all logistic parameter models which began with the item compatibility test against the model. Based on the results of the analysis, it can be concluded that all subtests matched the 3-PL model. Most of EPOT’s test items had a good range of difficulty index and discrimination index. The EPOT information’s function showed that accurate items were used on the 3-PL model for certain capability range. This study was expected to point out that EPOT test could be used as an alternative English proficiency test that is easy to use and useful.


Author(s):  
Mohammad Reza Ghorbani ◽  
Hadi Abbassi ◽  
Abu Bakar Mohamed Razali

The Iranian Ministry of Science, Research, and Technology (MSRT) English Proficiency Test (EPT) has been in use since 1992. While the MSRT-EPT is generally claimed to be reliable, valid, and practical, it does not assess speaking and writing skills. In this exploratory study, a qualitative approach was used to examine the MSRT-EPT test-takers experiences and language education experts’ beliefs about the test as well as their congruence with each other through semi-structured telephone interviews. Convenience and purposive sampling procedures were used to select 15 participants. Inductive coding method was applied to determine invariant constituents. Then, the constituents were reduced to categories, and finally the categories were clustered into 11 themes. Dependability and validity of the study were established through triangulation, inter-coder agreement, and member checking technique. The problems associated with the MSRT-EPT and a lack of productive skills included a lack of correspondence between the test content and Ph.D. Candidates'' needs, negative washback effect, non-theory-based content, inappropriate listening conditions, and a lack of test items originality. On the other hand, the candidates’ and experts’ perspectives were highly congruent. In light of these findings, the importance of designing a more comprehensive test including all facets of the language proficiency construct was highlighted, and some suggestions were made for future research.


2018 ◽  
Vol 2 (01) ◽  
Author(s):  
Besin Gaspar ◽  
Yenny Hartanto

Recently the university students are required by their institutions to have the TOEFL score in the fisrt year or in the last year of their study before graduation. Some other higher institutions require their students to submit TOEIC, not TOEFL, before graduation. Companies, in the recruitment process, require the applicants to submit TOEFL score to show their level of English proficiency. The first question is which one is more appropriate for job applicants in the compay: TOEFL  or TOEIC. Another question for university students before graduation is whether to have TOEFL  in the first year or in the last year before graduation. This article aims at answering the two questions raised. The first part will give an overview of various versions of TOEFL  and  TOEIC  and the second part proposes the appropriate English proficiency test  for the recruitment process for new employees and for the university graduates, that is, TOEIC for the company  and TOEFL  for universities  and  colleges. 


2015 ◽  
Vol 117 (1) ◽  
pp. 1-36
Author(s):  
Maria Araceli Ruiz-Primo ◽  
Min Li

Background A long-standing premise in test design is that contextualizing test items makes them concrete, less demanding, and more conducive to determining whether students can apply or transfer their knowledge. Purpose We assert that despite decades of study and experience, much remains to be learned about how to construct effective and fair test items with contexts. Too little is known about how item contexts can be appropriately constructed and used, and even less about the relationship between context characteristics and student performance. The exploratory study presented in this paper seeks to contribute to knowledge about test design and construction by focusing on this gap. Research Design We address two key questions: (a) What are the characteristics of contexts used in the PISA science items? and (b) What are the relationships between different context characteristics and student performance? We propose a profiling approach to capture information about six context dimensions: type of context, context role, complexity, resources, level of abstraction, and connectivity. To test the approach empirically we sampled a total of 52 science items from PISA 2006 and 2009. We describe the context characteristics of the items at two levels (named layers): general (testlet context) and specific (item context). Conclusion We provide empirical evidence about the relationships of these characteristics with student performance as measured by the international percentage of correct responses. We found that the dimension of context resources (e.g., pictures, drawings, photographs) for general contexts and level of abstractness for specific contexts are associated with student performance.


2021 ◽  
Author(s):  
Yi-Ting Chen ◽  

Due to the homogeneity of the product or sample, it will affect whether it meets the scope of application and purpose. For example, the reference materials(RM) produced by the reference material producer(RMP), and the proficiency test items selected by the proficiency testing provider(PTP), in order to ensure the reference materials or proficiency test items have consistent characteristics or comparability, they should be proved to have certain homogeneity. However, before performing homogeneity assessment, it is necessary to measure the characteristic parameters of the reference materials or proficiency test items to obtain a sufficient number of measured values for data analysis, but there may be outliers in the measured values that may affect data analysis and interpretation of the results. Therefore, this article will refer to ASTM E178-16a:2016[1], ISO 5725-2:1994[2], ISO 13528:2015[3], etc., to introduce several outlier detection and homogeneity assessment methods, supplemented by case studies. Finally, this article will remind the precautions for the use of the method, so that readers can choose the appropriate method for use in the actual analysis.


Sign in / Sign up

Export Citation Format

Share Document