Sunday, July 9 3:40pm
On reliability of assessment and the use of rubrics to assess writing in a linguistic classroom
Daniil M. Ozernyi doz@u.northwestern.ed
Northwestern University, Evanston, USA
Teaching linguistics in college, particularly on late undergraduate and early graduate levels, requires assessing written discourse produced by students as a part of formative or summative assessment, whether that be an abstract, a short review paper, or a short squib. To facilitate the assessment, rubrics are usually used.
However, rarely do the teachers think about the reliability (hence fairness) of their assessment. Reliability is defined as a part of validity of assessment and corresponds roughly to the reproducibility of scores. In the case of rubrics, this is rater reliability. Crucially, however, it’s not only inter-rater reliability (whether the scores are consistent between a number of raters), but intra-rater reliability (whether one rater gives the scores consistently).
For example, Rezaei and Lovorn (2010) showed that use of rubrics can actually decrease the reliability of assessment if the person using the rubric has not received training and does not have qualifications to design rubrics for assessing writing. This bears heavily on fairness of assessment in classroom contexts. Yet, while discussed widely for large-scale assessment (e.g., TOEFL, &c.), this topic is rarely discussed for small-scale assessments such as those in a linguistic classroom.
In this talk, our objective is threefold:
- familiarize the scholarly teachers with concepts of reliability and validity (argument validity specifically; Chapelle 2002, Chapelle and Voss 2021) as they relate to assessing writing;
- consider a few rubrics taken from existing courses and discuss the potential reliability problems for those rubrics and compare them to rubrics which have been validated (e.g., Galaczi et al., 2011 or Uludag and McDonough, 2022);
- for practicing teachers, suggest strategies to increase reliability in a classroom-context where validating research is often not feasible.
Highlighting this problem will serve as a useful interdisciplinary connection between language testing and scholarly teaching. It should also encourage teachers to exercise caution while administering written assessments, thereby increasing fairness.
References
Chapelle, C. A. (2020). Validity in language assessment. In The Routledge handbook of second language acquisition and language testing (pp. 11-20). Routledge.
Chapelle, C. A., & Voss, E. (Eds.). (2021). Validity argument in language testing: Case studies of validation research. Cambridge University Press.
Galaczi, E. D., ffrench, A., Hubbard, C., & Green, A. (2011). Developing assessment scales for large-scale speaking tests: A multiple-method approach. Assessment in Education: Principles, Policy & Practice, 18(3), 217-237.
Rezaei, A. R., & Lovorn, M. (2010). Reliability and validity of rubrics for assessment through writing. Assessing writing, 15(1), 18-39.
Uludag, P., & McDonough, K. (2022). Validating a rubric for assessing integrated writing in an EAP context. Assessing Writing, 52, 100609.