Guidelines for Interpreting SOIs
Note: The following recommendations were taken from the University of Washington website, with only slight modifications. The recommendations remain strong ones for the best ways for both faculty and administrators to consider Student Opinions of Instruction (SOIs).
Student course ratings have many uses, particularly if viewed over time and across courses. Student ratings provide information that instructors can use to identify areas of strength and areas needing improvement in their teaching. Furthermore, departments and teaching units can use student ratings in the aggregate to assess the overall performance of multi-course and multi-instructor units, as well as to evaluate individual instructors for personnel reasons, such as decisions regarding retention, promotion, tenure and merit pay.
The recommendations listed below can provide helpful guidelines for the use of student course ratings in personnel decisions.
- Student ratings must be used in concert with other data that relate to the quality of a faculty member's teaching, rather than as a sole indicator of teaching quality. Other sources such as peer reviews of classroom sessions, peer reviews of curricular materials, and faculty self-reflection should be assessed in addition to student evaluations to gain a true sense of the teaching skills and performance of a faculty member. Consideration of these other sources of evidence is especially important because student ratings alone do not provide sufficient evidence of the extent of student learning in a course.
- Evaluations from more than a single section should be used in making any decision about teaching quality. Research has shown that ratings from at least five courses are necessary to assure adequate reliability. The validity of the ratings for measuring teaching quality is increased as a greater variety of course formats is represented in the data upon which decisions are based. Trends in ratings across years may also be important in assessing teaching.
- Overall ratings of teaching effectiveness are most appropriate to use in personnel decisions. Overall ratings of the teacher and the course tend to correlate more closely with student achievement scores than do other items. More specific items should be used by the faculty member for review of specific skills and areas for improvement.
- Small differences in individual evaluations should not be used as a basis for differential decisions. Because student ratings yield numerical averages, there is a temptation to overestimate the precision of the averages that are presented. Small differences in ratings may not be meaningful. It is better to deal with much broader classifications, such as Excellent/Good/Acceptable/Unacceptable or Significantly Exceeds Expectations/Meets Expectations/Falls Short of Expectations/Falls Significantly Short of Expectations.
- Interpretations of student ratings averages should be guided by awareness that students tend to rate faculty at or near the high end of the scale. It is therefore not appropriate to use the median (or 50th percentile) as a presumed dividing line between strong and weak teachers. More appropriate would be to assume that the majority of teachers are strong. It is also appropriate, when evaluating average ratings of individual instructors, to consider relevant comparisons (see Recommendation 6) and specific characteristics of courses taught (see Recommendation 7).
- Comparative data should be used with caution. Department-wide comparison data might be reported on the summary report. However, for comparisons to be useful, the normative group should be based on more than a narrow population of instructors. Smaller departments may not want to rely on departmental norms but use norms calculated for a number of similar departments.
- Course characteristics should be considered when interpreting results. For example, large lecture courses typically receive lower ratings than smaller courses, new courses being taught for the first time receive lower ratings than well-established courses, introductory courses for non-majors receive lower ratings than higher division courses for majors. Adjustments for course type should be made in order to have a fairer sense of the faculty member's teaching skills. One way to adjust for course types is by choosing similar courses for normative comparisons.
- Faculty members should be given an opportunity to respond to evaluation results. Faculty should have an opportunity to discuss the objectives of the course, how the teaching methods were used to meet those objectives, and how circumstances in the course might have affected evaluations. Furthermore, other evaluation information gained from a given course (see Recommendation 1) can aid with the interpretation of ratings results. (At VSU, faculty members are given the chance to respond in their annual Faculty Activity Report and Action Plan).
- Administration of course ratings should be scheduled to maximize the number of respondents. Generally, evaluations will have greater validity when higher proportions of the enrolled students complete evaluation forms. Ratings may not be an accurate reflection of the entire class when smaller proportions of students respond. This problem can be particularly acute in small classes. It is recommended that at least two-thirds of enrolled students must be included in the results to have any confidence in the results. As proportions decrease, particularly in small classes, there is greater opportunity for the rating of one or a few students to disproportionately affect the results.
Similar advice is offered in Angela Linse’s article, “Interpreting and using student ratings data: Guidance for faculty serving as administrators and on evaluation committees,” published in Studies in Educational Evaluation in September 2017 (available at https://www.sciencedirect.com/science/article/pii/S0191491X16300232).
This article recommends the following best practices:
- Student ratings should be only one of multiple measures of teaching.
- In personnel decisions, a faculty member’s complete history of student ratings should be considered, rather than a single composite score.
- Small differences in mean (average) ratings are common and not necessarily meaningful.
- Treat anomalous ratings for what they are, not as representative of a faculty member’s teaching.
- Examine the distribution of scores across the entire scale, as well as the mean.
- Evaluate each faculty member individually. Evaluations and decisions should stand alone without reference to other faculty members; avoid comparing faculty to each other or to a unit average in personnel decisions.
- Focus on the most common ratings and comments rather than emphasizing one or a few outlier ratings or comments.
- Contradictory written comments are not unusual.