Jump to ContentJump to Main Navigation
Show Summary Details
More options …

 

Language Learning in Higher Education

Journal of the European Confederation of Language Centres in Higher Education (CercleS)

Editor-in-Chief: Szczuka-Dorna, Liliana / O’Rourke, Breffni

Online
ISSN
2191-6128
See all formats and pricing
More options …

Developing CEFR-related language proficiency tests: A focus on the role of piloting

Caroline ShackletonORCID iD: http://orcid.org/0000-0003-0221-8575
Published Online: 2018-09-20 | DOI: https://doi.org/10.1515/cercles-2018-0019

Abstract

Most language proficiency exams in Europe are presently developed so that reported scores can be related to the Common European Framework of Reference for Languages (CEFR; (Council of Europe. 2001. Common European framework of reference for languages: learning, teaching, assessment. Cambridge: Cambridge University Press.). Before any CEFR linking process can take place, such tests should be shown to be both valid and reliable, as “if an exam is not valid or reliable, it is meaningless to link it to the CEFR [and] a test that is not reliable cannot, by definition, be valid” (Alderson, Charles J. 2012. Principles and practice in language testing: compliance or conflict? Presentation at TEA SIG Conference: Innsbruck. http://tea.iatefl.org/inns.html (accessed May 2017).). In the test development process, tasks developed based on test specifications must therefore be piloted in order to check that test items perform as predicted. The present article focuses on the statistical analysis of test trial data provided by the piloting of three B1 listening tasks carried out at the University of Granada’s Modern Language Center (CLM). Here, results from a detailed Rasch analysis of the data showed the test to be consistently measuring a unidimensional construct of listening ability. In order to confirm that the test contains items at the correct difficulty level, teacher judgements of candidates’ listening proficiency were also collected. The test was found to separate A2 and B1 candidates well; used in conjunction with the establishment of appropriate cut scores, the reported score can be considered an accurate representation of CEFR B1 listening proficiency. The study demonstrates how Rasch measurement can be used as part of the test development process in order to make improvements to test tasks and hence create more reliable tests

Keywords: common European framework of reference for languages (CEFR); listening proficiency; assessment; test validity; test reliability; item response theory

References

  • AERA (American Educational Research Association), APA (American Psychological Association) & NCME (National Council on Measurement in Education). 2014. Standards for educational and psychological testing. Washington, DC: AERA.Google Scholar

  • Alderson, Charles J. 2012. Principles and practice in language testing: compliance or conflict? Presentation at TEA SIG Conference: Innsbruck. http://tea.iatefl.org/inns.html (accessed May 2017).

  • Alderson, Charles J & Jayanti Banerjee. 2002. Language testing and assessment (part 2). Language Teaching 35(2). 79–113.Google Scholar

  • Alderson, Charles.J, Caroline Clapham & Diana Wall. 1995. Language test construction and evaluation. Cambridge: Cambridge University Press.Google Scholar

  • ALTE/Council of Europe. 2011. Manual for language test development and examining. for use with the CEFR. http://www.coe.int/t/dg4/linguistic/ManualtLangageTest-Alte2011_EN.pdf (accessed January 2017).

  • Bachman, Lyle. F. 1990. Fundamental considerations in language testing. Oxford: Oxford University Press.Google Scholar

  • Bachman, Lyle. F. 2004. Statistical analysis for language assessment. Cambridge: Cambridge University Press.Google Scholar

  • Bachman, Lyle. F. 2005. Building and supporting a case for test use. Language Assessment Quarterly 2(1). 1–34.CrossrefGoogle Scholar

  • Bachman, Lyle. F. 2007. What is the construct? The dialectic of abilities and contexts in defining constructs in language assessment. In Janna Fox, Mari Wesche, Doreen Bayliss, Carolyn E Liying Cheng & Christine Doe Turner (eds.), Language testing reconsidered, 41–71. Ottawa: University of Ottawa Press.Google Scholar

  • Bond, Trevor. G & Christine M Fox. 2015. Applying the Rasch model: Fundamental measurement in the human sciences. 3rd edn., New York: Routledge.Google Scholar

  • Buck, Gary. 2001. Assessing listening. Cambridge: Cambridge University Press.Google Scholar

  • Cohen, Jacob. 1988. Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.Google Scholar

  • Council of Europe. 2001. Common European framework of reference for languages: Learning, teaching, assessment. Cambridge: Cambridge University Press.Google Scholar

  • Council of Europe. 2008. Recommendation CM/Rec (2008)7 of the committee of ministers to member states on the use of the council of Europe’s Common European framework of reference for languages (CEFR) and the promotion of plurilingualism. Strasbourg: Council of Europe. http://www.coe.int/t/dg4/linguistic/Conventions_EN.asp (accessed January 2017).

  • Council of Europe. 2009. Relating language examinations to the common European framework of reference for languages: Learning, teaching, assessment (CEFR): A manual. Strasburg: Council of Europe. http://www.coe.int/T/DG4/Linguistic/Manuel1_EN.asp (accessed January 2017).

  • Council of Europe. 2017. Common European framework of reference for languages: Learning, teaching, assessment. Companion volume with new descriptors. Strasbourg: Council of Europe.Google Scholar

  • Davies, Alan & Catherine Elder. 2005. Validity and validation in language testing. In Eli Hinkle (ed.), Handbook of research in second language teaching and learning, vol. 1, 795–813. Mahwah, NJ: Lawrence Erlbaum.Google Scholar

  • Field, John. 2008. Listening in the language classroom. Cambridge: Cambridge University Press.Google Scholar

  • Field, John. 2013. Cognitive validity. In Ardeshir Geranpayeh & Lynda Taylor (eds.), Examining listening: research and practice in assessing second language listening. Cambridge: Cambridge University Press.Google Scholar

  • Green, Rita. 2013. Statistical analyses for language test developers. London: Palgrave Macmillan.Google Scholar

  • Green, Rita. 2017. Designing listening tests: A practical approach. London: Palgrave Macmillan.Google Scholar

  • Kane, Michael. 2012. Validating score interpretations and uses. Language Testing 29(1). 3–17.CrossrefWeb of ScienceGoogle Scholar

  • Kane, Michael. 2013. Validating the interpretations and uses of test scores. Journal of Educational Measurement 50(1). 1–73.CrossrefWeb of ScienceGoogle Scholar

  • Kecker, Gabriele & Thomas Eckes. 2010. Putting the manual to the test: The TestDaF–CEFR linking project. In Waldemar Martyniuk (ed.), Aligning tests with the CEFR: Reflections on using The Council Of Europe’s Draft Manual, 50–79. Canbridge: Cambrideg University Press.Google Scholar

  • Kolen, Michael J & Robert L Brennan. 2014. Test equating, scaling, and linking: Methods and practices. 3rd edn. New York: Springer-Verlag.Google Scholar

  • Larry Vandergrift & Christine C. M Goh. 2012. Teaching and learning second language listening: Metacognition in action. New York: Routledge.Google Scholar

  • Linacre, John Michael. 2017. Winsteps® Rasch measurement computer program user’s guide. Beaverton, Oregon: Winsteps.com (accessed January 2017).

  • McNamara, Timothy Francis. 1996. Measuring second language performance. Harlow: Addison Wesley Longman Ltd.Google Scholar

  • Messick, Samuel. 1989. Validity. In Robert L Linn (ed.), Educational measurement, 3rd edn., 13–103. New York, NY: Macmillan.Google Scholar

  • North, Brian & Neil Jones. 2009. Further material on maintaining standards across languages, contexts and administrations by exploiting teacher judgment and IRT scaling. Strasbourg: Council of Europe.Google Scholar

  • Rasch, George. 1960. Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.Google Scholar

  • Reckase, Mark D. 2010. NCME 2009 presidential address: What I think I know. Educational Measurement: Issues and Practice 29(3). 3–7.CrossrefGoogle Scholar

  • Shackleton, Caroline. 2018. Linking the university of Granada CertAcles listening test to the CEFR. Revista de Educación 381. 37–65.Google Scholar

  • Sick, James. 2008. Rasch measurement in language education part 2: Measurement scales and invariance. Shiken: JALT Testing & Evaluation SIG Newsletter 12(2). 26–31.Google Scholar

  • Sick, James. 2010. Rasch measurement in language education part 5: Assumptions and requirements of Rasch measurement. SHIKEN: JALT Testing & Evaluation SIG Newsletter 14(2). 23–29.Google Scholar

  • Wright, Benjamin D & Mark H Stone. 1979. Best test design. Chicago: MESA.Google Scholar

  • Wu, Margaret & Ray Adams. 2007. Applying the Rasch model to psycho-social measurement: A practical approach. Melbourne: Educational Measurement Solutions.Google Scholar

  • Xi, Xiaoming. 2008. Methods of test validation. In Elana Shohamy (ed.), Language testing and assessment, volume 7 of encyclopedia of language and education, 177–196. New York: Springer.Google Scholar

About the article

Caroline Shackleton

Caroline Shackleton is a teacher and test developer at the University of Granada’s Modern Language Centre (Centro de Lenguas Modernas). She holds an MA in Language Testing from the University of Lancaster and a PhD in Applied Linguistics (Language Testing) from the University of Granada. Currently, she is an expert member of ACLES (Association of Higher Education Language Centers in Spain) and regularly provides training in language testing.


Published Online: 2018-09-20

Published in Print: 2018-09-25


Citation Information: Language Learning in Higher Education, Volume 8, Issue 2, Pages 333–352, ISSN (Online) 2191-6128, ISSN (Print) 2191-611X, DOI: https://doi.org/10.1515/cercles-2018-0019.

Export Citation

© 2018 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in