Recently, an opportunity arose to read up a little bit on the science behind student evaluations of teaching (SET). Our gut feeling has always been that these were more of popularity measures, student experience reports and superficial ”product reviews” rather than any solid estimate of student learning, pedagogical effectiveness, course development effects or teacher dedication. Not surprisingly, just after reading a few surveys / meta-studies our feelings were confirmed.

Heffernan’s (2022) survey in Assessment & Evaluation in Higher Education reveals that student evaluations are strongly influenced by the composition of the student group, the teacher’s background, the teacher’s identity, and other aspects unrelated to the quality of the course or the effectiveness of the pedagogy. The study also shows that SET contribute to unnecessary stress for all teachers, and particularly for those in already vulnerable groups. Esarey and Valdes (2020), again in Assessment & Evaluation in Higher Education, establish the most optimistic conditions and benevolent interpretive frameworks for SET. Even under these ideal circumstances, evaluations systematically fail to reliably reward the best pedagogy, and a significant proportion of teachers who receive low scores from students still exhibit high instructional quality according to other, arguably more nuanced (even objective), assessments. Stroebe (2020), in the journal Basic and Applied Social Psychology, argues that student evaluations actually encourage poorer teaching and grade inflation. Students want good grades, and teachers want good evaluations. Therefore, evaluations become a tool for students to shape or influence teacher behavior. The findings in this article show that students reward teachers who grade more leniently, or courses perceived as easier to pass, with more positive evaluations.

In addition to all this support from science, the poor response rates often received in SET cannot be explained away or overlooked. Had we employed this type of design and obtained these kinds of results in a research study, we would have to redo the study, as these are not reliable results. To then base pedagogical changes (or even worse, teaching / teacher quality assessments) on them, simply because “they are all we have,” is, to say the least, irresponsible. Simply the fact that SET results vary from student group to student group makes them totally unsuitable as agents of change. A course evaluation could potentially be useful for identifying opportunities or seeds for inspiration, or for addressing areas where there is a zero tolerance policy; not for screening out ‘problems’. This relates to SET or, we argue, any other ‘evaluation’ model; research shows that they simply cannot be used as decision support to develop courses and pedagogics, and should certainly not be used to control or evaluate teachers (e.g. to ‘weed out bad teachers’). This is an important qualitative difference that also sends a signal to any teaching college. Using evaluations in this way cast suspicion on the efforts, skills and interactions that teachers already make use of in the classroom, it projects feelings of precarity in teachers, and thereby severely undermines trust and collegiality.

So, if the purpose really is to improve the quality of education, SET is not a good way forward; if you instead seek to increase intersectional injustices and the feeling of surveillance however, it seems to do the job! There are too many dysfunctional and invalid/ating command- and control systems on the employer side already — we definitely do not need to invent new ones for ourselves!

References
Esarey, J., & Valdes, N. (2020). Unbiased, reliable, and valid student evaluations can still be unfair. Assessment & Evaluation in Higher Education, 45(8), 1106-1120.

Heffernan, T. (2022). Sexism, racism, prejudice, and bias: A literature review and synthesis of research surrounding student evaluations of courses and teaching. Assessment & Evaluation in Higher Education, 47(1), 144-154.

Stroebe, W. (2020). Student evaluations of teaching encourages poor teaching and contributes to grade inflation: A theoretical and empirical analysis. Basic and Applied Social Psychology, 42(4), 276-294.