A Corpus-based Evaluation of a High-stakes EFL Exam

Document Type : Original Article

Authors

Department of Foreign Languages and Linguistics, Shiraz University, Shiraz, Iran

Abstract

High-stakes assessments play a significant role in people’s lives, and their results greatly define individuals’ future social and financial prospects. Corpus linguistics has recently been used to inform the development and validation of such tests. This study aimed at identifying the degree of typicality of vocabulary items tested in the English proficiency subtest of the Master of Arts/Science Iranian University Entrance Exam. To this end, the vocabulary options and collocations in 20 test versions were extracted, and their frequency of occurrence in the Corpus of Contemporary American English was examined using a specially written computer program. The results indicated that the frequency of the options in the academic genre was not as dominant as expected in a test designed for academic purposes. The findings also revealed some inconsistencies among the different parallel test versions in terms of their option frequencies. Furthermore, for some options and collocations, atypicality was observed as zero or close to zero instances in the corpus. The current study suggests the inclusion of frequency information from corpora and various wordlists to accompany test developers’ intuition for more robust vocabulary assessment.

Keywords

Main Subjects


Ahmadi, A. & Thompson, N. A. (2012). Issues affecting item response theory fit in language assessment: A study of differential item functioning in the Iranian national university entrance exam. Journal of Language Teaching & Research3(3), 401-412.
Ahmadi, A., Darabi Bazvand, A., Sahragard, R. & Razmjoo, A. (2015). Investigating the validity of PhD entrance exam of ELT in Iran in light of argument-based validity and theory of action. Journal of Teaching Language Skills34(2), 1-37.
Akbari, N. (2016). Word frequency and morphological family size effects on the accuracy and speed of lexical access in school-aged bilingual students. International Journal of Applied Linguistics, 26(3), 311-328.
Alderson, C. & Wall, D. (1993). Does washback exist? Applied Linguistics, 14(2), 115-129.
Alderson, J. C. (1996). Do corpora have a role in language assessment? In Thomas, J. & Short, M. (Eds.), Using corpora for language research: Studies in the honour of Geoffrey Leech (pp. 248-259), London: Longman.
Bachman, L. & Palmer, A. S. (1996). Language Testing in Practice. Oxford: Oxford University Press.
Bai, Y. (2005). Authenticity Assessment of Proofreading in NMET by Corpus-based Approach. Unpublished master’s thesis, Guangdong University of Foreign Studies, Guangzhou, China.
Bazvand, A. D., Kheirzadeh, S. & Ahmadi, A. (2019). On the statistical and heuristic difficulty estimates of a high stakes test in Iran. International Journal of Assessment Tools in Education6(3), 330-343.
Beglar, D. & Nation, P. (2007). A vocabulary size test. The Language Teacher31(7), 9-13.
Beigman Klebanov, B., Ramineni, C., Kaufer, D., Yeoh, P. & Ishizaki, S. (2019). Advancing the validity argument for standardized writing tests using quantitative rhetorical analysis. Language Testing36(1), 125-144.
Biber, D., Conrad, S., Reppen, R., Byrd, P. & Helt, M. (2002). Speaking and writing in the university: A multidimensional comparison. TESOL Quarterly36(1), 9-48.
Bovaird, J. A., Geisinger, K. F. & Buckendahl, C. W. (2011). High-stakes Testing in Education: Science and Practice in K-12 Settings. Washington, DC: American Psychological Association.
Brown, J. C., Frishkoff, G. A. & Eskenazi, M. (2005). Automatic question generation for vocabulary assessment. Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, 819-826.
Chen, X., Dong, Y. & Yu, X. (2018). On the predictive validity of various corpus-based frequency norms in L2 English lexical processing. Behavior Research Methods50(1), 1-25.
Choi, I. C. & Moon, Y. (2020). Predicting the difficulty of EFL tests based on corpus linguistic features and expert judgment. Language Assessment Quarterly17(1), 18-42.
Chujo, K. & Hasegawa, S. (2003). Jijieigo no jugyo de motiirareru eibunsozai no goi reberuchousadBNC (British National Corpus) wo kijun ni site [An investigation of vocabulary levels of materials used in current English class: in reference to BNC]. Jiji Eigogaku Kenkyu, 42, 439-451.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213-238.
Crossley, S. A., Salsbury, T., McNamara, D. S. & Jarvis, S. (2011). Predicting lexical proficiency in language learner texts using computational indices. Language Testing28(4), 561-580.
Crosthwaite, P. R. & Raquel, M. (2019). Validating an L2 academic group oral assessment: Insights from a spoken learner corpus. Language Assessment Quarterly16(1), 39-63.
Culligan, B. (2015). A comparison of three test formats to assess word difficulty. Language Testing32(4), 503-520.
Cushing, S. T. (2017). Corpus linguistics in language testing research. Language Testing, 34(4), 441-449.
Davies, M. (2008). The corpus of contemporary American English: 450 million words, 1990-present. Available from http://corpus.byu.edu/coca
Davis, A. (2006). High stakes testing and the structure of the mind: A reply to Randall Curran. Journal of Philosophy of Education, 40(1), 1-16.
Egbert, J. (2017). Corpus linguistics and language testing: Navigating uncharted waters. Language Testing34(4), 555-564.
Ellis, N. C. (2002). Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition, 24, 143-188.
Gardner, D. & Davies, M. (2014). A new academic vocabulary list. Applied Linguistics, 35(3), 305-327.
Gebril, A. & Eid, M. (2017). Test preparation beliefs and practices in a high-stakes context: A teacher’s perspective. Language Assessment Quarterly14(4), 360-379.
Goodfellow, R., Lamy, M. -N. & Jones, G. (2002). Assessing learners’ writing using lexical frequency. ReCALL, 14(1), 133-145.
Hazenberg, S. & Hulstijn, J. (1996). Defining a minimal receptive second-language vocabulary for non-native university students: an empirical investigation. Applied Linguistics, 17(2), 145-163.
Iranian National Organization for Educational Testing, (2020). www.sanjesh.org
Isaacs, T., Trofimovich, P. & Foote, J. A. (2018). Developing a user-oriented second language comprehensibility scale for English-medium universities. Language Testing35(2), 193-216.
Johansson, S. (2009). Some thoughts on corpora and second-language acquisition. Corpora and language teaching, 33-44.
Larsson, M. & Olin-Scheller, C. (2020). Adaptation and resistance: washback effects of the national test on upper secondary Swedish teaching. The Curriculum Journal, 31(4), 687-703.
Laufer, B. (1992). How much lexis is necessary for reading comprehension? In Arnaud, P. J. L. & Bejoint, H. (Eds.), Vocabulary and Applied Linguistics, pp. 126-132, London: Macmillan Academic and Professional.
Laufer, B., Elder, C., Hill, K. & Congdon, P. (2004). Size and strength: Do we need both to measure vocabulary knowledge? Language Testing, 21(2), 202-226.
Lin, D. & Gao, M. (2020). Book review: Teacher involvement in high-stakes language testing. Language Testing, 37(1), 159-162.
Lin, Y. C., Sung, L. C. & Chen, M. C. (2007). An automatic multiple-choice question generation scheme for English adjective understanding. In Workshop on Modeling, Management and Generation of Problems/Questions in eLearning, the 15th International Conference on Computers in Education (ICCE), 137-142.
Mitkov, R. & Ha, L. A. (2003). Computer-aided generation of multiple-choice tests. Proceedings of the HLT-NAACL 03 Workshop on Building Educational Applications Using Natural Language Processing, 2, 17-22.
Monteiro, K. R., Crossley, S. A. & Kyle, K. (2020). In search of new benchmarks: Using L2 lexical frequency and contextual diversity indices to assess second language writing. Applied Linguistics41(2), 280-300.
Nation, P. (2006). How large a vocabulary is needed for reading and listening? The Canadian Modern Language Review, 63(1), 59-82.
Okamoto, M. (2015). Is corpus word frequency a good yardstick for selecting words to teach? Threshold levels for vocabulary selection. System51, 1-10.
Pan, M. & Qian, D. D. (2017). Embedding corpora into the content validation of the grammar test of the National Matriculation English Test (NMET) in China. Language Assessment Quarterly14(2), 120-139.
Paribakht, T. S. & Webb, S. (2016). The relationship between academic vocabulary coverage and scores on a standardized English proficiency test. Journal of English for Academic Purposes21, 121-132.
Park, K. (2014). Corpora and language assessment: The state of the art. Language Assessment Quarterly, 11(1), 27-44.
Pawley, A. and Syder, F. (1983). Two puzzles for linguistic theory. In Richards, J. and Schmidt, R. (eds.). Language and Communication. London: Longman.
Rafatbakhsh, E., Ahmadi, A., Moloodi, A. & Mehrpour, S. (2021). Development and validation of an automatic item generation system for English idioms. Educational Measurement: Issues and Practice, 40(2), 49-59.
Ravand, H. & Firoozi, T. (2016). Examining construct validity of the master’s UEE using the Rasch model and the six aspects of the Messick’s framework. International Journal of Language Testing6(1), 1-18.
Ravand, H., Rohani, G. & Faryabi, F. (2018). On the factor structure (invariance) of the PhD UEE using multigroup structural equation modeling. Journal of Teaching Language Skills36(4), 141-170.
Razavipur, K. (2014). On the substantive and predictive validity facets of the university entrance exam for English majors. Research in Applied Linguistics5(1), 77-90.
Sasao, Y. & Webb, S. (2017). The word part levels test. Language Teaching Research21(1), 12-30.
Schmidtke, J. (2014). Second language experience modulates word retrieval effort in bilinguals: Evidence from pupillometry. Frontiers in Psychology, 5, 1-16.
Schmitt, N. (2012). Formulaic language and collocation. In C. Chapelle (Ed.), The encyclopedia of applied linguistics, pp. 1-10, New York: Blackwell.
Shohamy, E. (2001). The Power of Tests: A Critical Perspective on The Uses of Language Tests. Harlow, England: Longman.
Shohamy, E., Donitsa-Schmidt, S. & Ferman, I. (1996). Test impact revisited: Washback effect over time. Language Testing13(3), 298-317.
Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford, UK: Oxford University Press.
Staples, S., Biber, D. & Reppen, R. (2018). Using corpusbased register analysis to explore the authenticity of highstakes language exams: A register comparison of TOEFL iBT and disciplinary writing tasks. The Modern Language Journal102(2), 310-332.
Taylor, L. & Barker, F. (2008). Using corpora for language assessment. Encyclopedia of Language and Education7, 241-254.
Vu, D. V. (2019). A corpus-based lexical analysis of Vietnam’s high-stakes English exams. In The 20th English in Southeast Asia Conference. Singapore: National Institute of Education, Nanyang Technological University.
Weir, C. J. and Milanovic, M. (Eds.) (2003). Continuity and innovation: The History of the CPE, 1913-2002. Vol. 15, Cambridge, England: Cambridge University Press.