Evaluating teacher effectiveness is one of the more controversial aspects of contemporary educational reform. The IDEA form used by EvCC has some nice features but is relatively generic. At the primary and secondary level teaching evaluations can be highly quantitative and linked to standardized testing. One increasingly popular method for these evaluations is the Value Added Model (VAM). VAMs are complicated statistical models that attempt to separate the effects of teachers and schools from the effects of differences in student’s background.
The American Statistical Association (ASA) has just published a statement on using value added models for educational assessment. The statement is brief and uses plain language to describe some of the problems with the value added model (VAM) approach. The statement strongly implies that efforts to improve education by focusing on teaching quality may be misplaced as
Most estimates in the literature attribute between 1% and 14% of the total variability [in outcomes] to teachers. This is not saying that teachers have little effect on students, but that variation among teachers accounts for a small part of the variation in scores. The majority of the variation in test scores is attributable to factors outside of the teacher’s control such as student and family background, poverty, curriculum and unmeasured influences (emphasis in original).
While the can be read as discouraging the use of VAMs for high stakes comparison and evaluation, the ASA does see value in using VAMS for evaluating policies or teacher training programs.
The ASA statement also points out that certain statistical properties of VAM scores (large standard errors) makes VAM rankings unstable and they recommend that VAM estimates should always be accompanied by measures of precision. Measures of precision are a way of accounting for uncertainty and are important in any evaluation. When we see a difference between two groups (online vs. face to face classes, flipped vs. “normal” classes, EvCC vs. Edmonds), there are always at least two possible explanations. The difference might be the result of the specified difference between the groups or it might be due to chance. Measures of precision tell us about the likely impact of chance. Two common measures of precision are the margin of error that is printed with most reputable polls (although often at the bottom and in small print) and measures of statistical significance.