Ahmadi, A. & Thompson, N. A. (2012). Issues affecting item response theory fit in language assessment: A study of differential item functioning in the Iranian national university entrance exam. Journal of Language Teaching & Research, 3(3), 401-412.
Ahmadi, A., Darabi Bazvand, A., Sahragard, R. & Razmjoo, A. (2015). Investigating the validity of PhD entrance exam of ELT in Iran in light of argument-based validity and theory of action. Journal of Teaching Language Skills, 34(2), 1-37.
Akbari, N. (2016). Word frequency and morphological family size effects on the accuracy and speed of lexical access in school-aged bilingual students. International Journal of Applied Linguistics, 26(3), 311-328.
Alderson, C. & Wall, D. (1993). Does washback exist? Applied Linguistics, 14(2), 115-129.
Alderson, J. C. (1996). Do corpora have a role in language assessment? In Thomas, J. & Short, M. (Eds.), Using corpora for language research: Studies in the honour of Geoffrey Leech (pp. 248-259), London: Longman.
Bachman, L. & Palmer, A. S. (1996). Language Testing in Practice. Oxford: Oxford University Press.
Bai, Y. (2005). Authenticity Assessment of Proofreading in NMET by Corpus-based Approach. Unpublished master’s thesis, Guangdong University of Foreign Studies, Guangzhou, China.
Bazvand, A. D., Kheirzadeh, S. & Ahmadi, A. (2019). On the statistical and heuristic difficulty estimates of a high stakes test in Iran. International Journal of Assessment Tools in Education, 6(3), 330-343.
Beglar, D. & Nation, P. (2007). A vocabulary size test. The Language Teacher, 31(7), 9-13.
Beigman Klebanov, B., Ramineni, C., Kaufer, D., Yeoh, P. & Ishizaki, S. (2019). Advancing the validity argument for standardized writing tests using quantitative rhetorical analysis. Language Testing, 36(1), 125-144.
Biber, D., Conrad, S., Reppen, R., Byrd, P. & Helt, M. (2002). Speaking and writing in the university: A multidimensional comparison. TESOL Quarterly, 36(1), 9-48.
Bovaird, J. A., Geisinger, K. F. & Buckendahl, C. W. (2011). High-stakes Testing in Education: Science and Practice in K-12 Settings. Washington, DC: American Psychological Association.
Brown, J. C., Frishkoff, G. A. & Eskenazi, M. (2005). Automatic question generation for vocabulary assessment. Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, 819-826.
Chen, X., Dong, Y. & Yu, X. (2018). On the predictive validity of various corpus-based frequency norms in L2 English lexical processing. Behavior Research Methods, 50(1), 1-25.
Choi, I. C. & Moon, Y. (2020). Predicting the difficulty of EFL tests based on corpus linguistic features and expert judgment. Language Assessment Quarterly, 17(1), 18-42.
Chujo, K. & Hasegawa, S. (2003). Jijieigo no jugyo de motiirareru eibunsozai no goi reberuchousadBNC (British National Corpus) wo kijun ni site [An investigation of vocabulary levels of materials used in current English class: in reference to BNC]. Jiji Eigogaku Kenkyu, 42, 439-451.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213-238.
Crossley, S. A., Salsbury, T., McNamara, D. S. & Jarvis, S. (2011). Predicting lexical proficiency in language learner texts using computational indices. Language Testing, 28(4), 561-580.
Crosthwaite, P. R. & Raquel, M. (2019). Validating an L2 academic group oral assessment: Insights from a spoken learner corpus. Language Assessment Quarterly, 16(1), 39-63.
Culligan, B. (2015). A comparison of three test formats to assess word difficulty. Language Testing, 32(4), 503-520.
Cushing, S. T. (2017). Corpus linguistics in language testing research. Language Testing, 34(4), 441-449.
Davies, M. (2008). The corpus of contemporary American English: 450 million words, 1990-present. Available from http://corpus.byu.edu/coca
Davis, A. (2006). High stakes testing and the structure of the mind: A reply to Randall Curran. Journal of Philosophy of Education, 40(1), 1-16.
Egbert, J. (2017). Corpus linguistics and language testing: Navigating uncharted waters. Language Testing, 34(4), 555-564.
Ellis, N. C. (2002). Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition, 24, 143-188.
Gardner, D. & Davies, M. (2014). A new academic vocabulary list. Applied Linguistics, 35(3), 305-327.
Gebril, A. & Eid, M. (2017). Test preparation beliefs and practices in a high-stakes context: A teacher’s perspective. Language Assessment Quarterly, 14(4), 360-379.
Goodfellow, R., Lamy, M. -N. & Jones, G. (2002). Assessing learners’ writing using lexical frequency. ReCALL, 14(1), 133-145.
Hazenberg, S. & Hulstijn, J. (1996). Defining a minimal receptive second-language vocabulary for non-native university students: an empirical investigation. Applied Linguistics, 17(2), 145-163.
Iranian National Organization for Educational Testing, (2020). www.sanjesh.org
Isaacs, T., Trofimovich, P. & Foote, J. A. (2018). Developing a user-oriented second language comprehensibility scale for English-medium universities. Language Testing, 35(2), 193-216.
Johansson, S. (2009). Some thoughts on corpora and second-language acquisition. Corpora and language teaching, 33-44.
Larsson, M. & Olin-Scheller, C. (2020). Adaptation and resistance: washback effects of the national test on upper secondary Swedish teaching. The Curriculum Journal, 31(4), 687-703.
Laufer, B. (1992). How much lexis is necessary for reading comprehension? In Arnaud, P. J. L. & Bejoint, H. (Eds.), Vocabulary and Applied Linguistics, pp. 126-132, London: Macmillan Academic and Professional.
Laufer, B., Elder, C., Hill, K. & Congdon, P. (2004). Size and strength: Do we need both to measure vocabulary knowledge? Language Testing, 21(2), 202-226.
Lin, D. & Gao, M. (2020). Book review: Teacher involvement in high-stakes language testing. Language Testing, 37(1), 159-162.
Lin, Y. C., Sung, L. C. & Chen, M. C. (2007). An automatic multiple-choice question generation scheme for English adjective understanding. In Workshop on Modeling, Management and Generation of Problems/Questions in eLearning, the 15th International Conference on Computers in Education (ICCE), 137-142.
Mitkov, R. & Ha, L. A. (2003). Computer-aided generation of multiple-choice tests. Proceedings of the HLT-NAACL 03 Workshop on Building Educational Applications Using Natural Language Processing, 2, 17-22.
Monteiro, K. R., Crossley, S. A. & Kyle, K. (2020). In search of new benchmarks: Using L2 lexical frequency and contextual diversity indices to assess second language writing. Applied Linguistics, 41(2), 280-300.
Nation, P. (2006). How large a vocabulary is needed for reading and listening? The Canadian Modern Language Review, 63(1), 59-82.
Okamoto, M. (2015). Is corpus word frequency a good yardstick for selecting words to teach? Threshold levels for vocabulary selection. System, 51, 1-10.
Pan, M. & Qian, D. D. (2017). Embedding corpora into the content validation of the grammar test of the National Matriculation English Test (NMET) in China. Language Assessment Quarterly, 14(2), 120-139.
Paribakht, T. S. & Webb, S. (2016). The relationship between academic vocabulary coverage and scores on a standardized English proficiency test. Journal of English for Academic Purposes, 21, 121-132.
Park, K. (2014). Corpora and language assessment: The state of the art. Language Assessment Quarterly, 11(1), 27-44.
Pawley, A. and Syder, F. (1983). Two puzzles for linguistic theory. In Richards, J. and Schmidt, R. (eds.). Language and Communication. London: Longman.
Rafatbakhsh, E., Ahmadi, A., Moloodi, A. & Mehrpour, S. (2021). Development and validation of an automatic item generation system for English idioms. Educational Measurement: Issues and Practice, 40(2), 49-59.
Ravand, H. & Firoozi, T. (2016). Examining construct validity of the master’s UEE using the Rasch model and the six aspects of the Messick’s framework. International Journal of Language Testing, 6(1), 1-18.
Ravand, H., Rohani, G. & Faryabi, F. (2018). On the factor structure (invariance) of the PhD UEE using multigroup structural equation modeling. Journal of Teaching Language Skills, 36(4), 141-170.
Razavipur, K. (2014). On the substantive and predictive validity facets of the university entrance exam for English majors. Research in Applied Linguistics, 5(1), 77-90.
Sasao, Y. & Webb, S. (2017). The word part levels test. Language Teaching Research, 21(1), 12-30.
Schmidtke, J. (2014). Second language experience modulates word retrieval effort in bilinguals: Evidence from pupillometry. Frontiers in Psychology, 5, 1-16.
Schmitt, N. (2012). Formulaic language and collocation. In C. Chapelle (Ed.), The encyclopedia of applied linguistics, pp. 1-10, New York: Blackwell.
Shohamy, E. (2001). The Power of Tests: A Critical Perspective on The Uses of Language Tests. Harlow, England: Longman.
Shohamy, E., Donitsa-Schmidt, S. & Ferman, I. (1996). Test impact revisited: Washback effect over time. Language Testing, 13(3), 298-317.
Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford, UK: Oxford University Press.
Staples, S., Biber, D. & Reppen, R. (2018). Using corpus‐based register analysis to explore the authenticity of high‐stakes language exams: A register comparison of TOEFL iBT and disciplinary writing tasks. The Modern Language Journal, 102(2), 310-332.
Taylor, L. & Barker, F. (2008). Using corpora for language assessment. Encyclopedia of Language and Education, 7, 241-254.
Vu, D. V. (2019). A corpus-based lexical analysis of Vietnam’s high-stakes English exams. In The 20th English in Southeast Asia Conference. Singapore: National Institute of Education, Nanyang Technological University.
Weir, C. J. and Milanovic, M. (Eds.) (2003). Continuity and innovation: The History of the CPE, 1913-2002. Vol. 15, Cambridge, England: Cambridge University Press.