Sunday, June 20, 2010

Teacher Performance Pay

A number of articles have appeared in recent weeks about incentive pay for teachers in public schools.

In the Atlantic, Dwayne Betts in Some Notes on Education offers these observations;

About a week ago The New York Times ran this article on teachers juicing their students' test scores. Specifically the article is about Normandy Cross Elementary school just outside of Houston, where teachers awaited the results of state tests knowing success came with a nice little bonus for them: $2,850. Long story short the tests came back too good to be true and after an investigation resignations started coming in. But did they do anything that goes beyond expectation? Tying teacher pay to student performance in this way seems doomed to fail, and the Times article cites sufficient examples of teachers playing with test scores to support that. But the main reason I see linking raises to test scores as a plan doomed to failure is because that system seems not to acknowledge how whatever you can teach a kid this year is tied to what they learned or failed to learn last year.

Of the many things President Obama has done recently, I'm most frustrated with what rarely gets discussed on national television: his education policy. He's not calling for a fundamental shift in the way we do education in the United States. He's calling for, among other things, reforming the NCLB act through improved assessments and an improved accountability system. Check out his plan here. The push for more assistance going to early education, and expanding Head Start, pre-school, and child care tax credits are all welcome moves. I have no idea where the money to pay for these initiatives will come from, though, but that's a different issue. What I'm considering here is whether improving assessments, the piece of his plan most relevant to teacher pay, will lead to more teacher's looking to nudge test scores is an issue.

SchoolFinance101's blog, on the other hand, cuts straight to the chase - litigation:

There are (at least) two very likely legal challenges that will occur once we start to experience our first rounds of teacher dismissal based on student assessment data.

Due Process Challenges

Removing a teacher’s tenure status is denial of a teacher’s property interest and doing so requires “due process.” That’s not an insurmountable barrier, even under typical teacher contracts that don’t require dismissal based on student test scores. Simply declaring that “a teacher will be fired if he/she shows 2 straight years of bad student test scores (growth or value-added)” and then firing a teacher for as much does not mean that the teacher necessarily was provided due process. Under a policy requiring that 51% of the employment decision be based on student value added test scores, a teacher could be wrongly terminated due to:

a) Temporal instability of the value-added measures

Ooooh…Temporal instability… what’s that supposed to mean? What it means is that teacher value-added ratings, which are averages of individual student gains, tend not to be that stable over time. The same teacher is highly likely to get a totally different value added rating from one year to the next. The above link points to a policy brief which explains that the year to year correlation for a teacher’s value added rating is only about .2 or .3. Further, most of the change or difference in the teacher’s value added rating from one year to the next is unexplainable – not by differences in observed student characteristics, peer characteristics or school characteristics. 87.5% (elementary math) to 70% (8th grade math) noise! While some statistical corrections and multi-year measures might help, it’s hard to guarantee or even be reasonably sure that a teacher wouldn’t be dismissed simply as a function of unexplainable low performance for 2 or 3 years in a row. That is, simply due to noise, and not the more troublesome issue of how students are clustered across schools, districts and classrooms.

b) Non-random assignment of students

The only fair way to compare teachers’ ability to produce student value-added is to randomly assign all students, statewide to all teachers… and then of course, to have all students live in exactly comparable settings with exactly comparable support structures outside of school, etc., etc. etc. That’s right. We’d have to send all of our teachers and all of our students to a single boarding school location somewhere in the state and make sure, absolutely sure that we randomly assigned students, the same number of students to each and every teacher in the system.

Obviously, that’s not going to happen. Students are not randomly sorted and the fact that they are not has serious consequences for comparing teachers’ ability to produce student value-added. See:

c) Student manipulation of test results

As she travels the nation on her book tour, Diane Ravitch raises another possibility for how a teacher might find him/herself out of a job by no real fault of actual bad teaching. As she puts it, this approach to teacher evaluation puts the teacher’s job directly in the students’ hands. And the students can, if they wish, choose to consciously abuse that responsibility. That is, the students could actually choose to bomb the state assessments to get a teacher fired, whether it’s a good teacher or a bad one. This would most certainly raise due process concerns.

d) A whole bunch of other uncontrollable stuff

A recent National Academies report noted:

“A student’s scores may be affected by many factors other than a teacher — his or her motivation, for example, or the amount of parental support — and value-added techniques have not yet found a good way to account for these other elements.”

This report generally urged caution regarding overemphasis of student value-added test scores in teacher evaluation – especially in high stakes decisions. Surely, if I was an expert witness testifying on behalf of a teacher who had been wrongly dismissed, I’d be pointing out that the National Academies said that using the student assessment data in this way is not a good idea.

Title VII of the Civil Rights Act Challenges

The non-random assignment of students leads to the second likely legal claim that will flood the courts as student testing based teacher dismissals begin – Claims of racially disparate teacher dismissal under Title VII of the Civil Rights Act of 1964. Given that students are not randomly assigned and that poor and minority – specifically black – students are densely clustered in certain schools and districts and that black teachers are much more likely to be working in schools with classrooms of low-income black students, it is highly likely that teacher dismissals will occur in a racially disparate pattern. Black teachers of low-income black students will be several times more likely to be dismissed on the basis of poor value-added test scores. This is especially true where a statewide fixed, rigid requirement is adopted and where a teacher must be de-tenured and/or dismissed if he/she shows value-added below some fixed value-added threshold on state assessments.

So, here’s how this one plays out. For every 1 white teacher dismissed on value-added basis, 10 or more black teachers are dismissed - relative to the overall proportions of black and white teachers. This gives the black teachers the argument that the policy has racially disparate effect. No, it doesn’t end there. A policy doesn’t violate Title VII merely because it has racially disparate effect. That just starts the ball rolling – gets the argument into court.

The state gets to defend itself – by claiming that producing value-added test scores is a legitimate part of a teacher’s job and then explaining how the use of those scores is, in fact neutral with respect to race. It just happens to have the disparate effect. Right? But, as the state would argue, that’s a good thing because it ensures that we can put better teachers in front of these poor minority kids, and get rid of the bad ones.

But, the problem is that the significant body of research on non-random assignment of students and its effect of value added scores indicates that it’s not necessarily differences in the actual effectiveness of black versus white teachers, but that the black teachers are concentrated in the poor black schools and that student clustering and not teacher effectiveness is leading to the disparate rates of teacher dismissal. So they weren’t fired because they were precisely measurably ineffective, they were fired because they had classrooms of poor minority students year after year? At the very least, it is statistically problematic to distill one effect from the other! As a result, it’s statistically problematic to argue that the teacher should be dismissed! There is at least equal likelihood that the teacher is wrongly dismissed as there is that the teacher is rightly dismissed. I suspect a court might be concerned by this.

Reduction in Force

Note that many of these same concerns apply to all of the recent rhetoric over teacher layoffs and the need to base those layoffs on effectiveness rather than seniority. It all sounds good, until you actually try to go into a school district of any size and identify the 100 “least effective” teachers given the current state of data for teacher evaluation. Simply writing into a reduction in force (RIF) policy a requirement of dismissal based on “effectiveness” does not instantly validate the “effectiveness” measures. And even the best “effectiveness” measures, as discussed above, remain really problematic, providing tenured teachers reduced on grounds of ineffectiveness multiple options for legal action.

Additional Concerns

These two legal arguments ignore the fact that school districts and states will have to establish two separate types of contracts for teachers to begin with, since even in the best of statistical cases, only about 1/5 of teachers (those directly responsible for teaching math or reading in grades three through eight) might possibly be evaluated via student test scores (see:

I’ve written previously about the technical concerns over value-added assessment of teachers and my concern that pundits are seemingly completely ignorant of the statistical issues. I’m also baffled that few others in the current policy discussion seem even remotely aware of just how few teachers might – in the best possible case – be evaluated via student test scores, and the need for separate contracts. But, I am perhaps most perplexed that no-one seems to be acknowledging the massive legal mess likely to ensue when (or if) these poorly conceived policies are put into action.

But the high-stakes testing regime affects more than teacher pay. Administrators too are pressured to meet scoring quotas that challenge their ethics. In a NYTimes piece called, Under Pressure, Teachers Tamper With Tests, Trip Gabriel reports on the NCLB's unethical and legally questionable underbelly:

For seven years, their school, Atherton Elementary in suburban Atlanta, had met the standards known in federal law as Adequate Yearly Progress — A.Y.P. in educators’ jargon — by demonstrating that a rising share of students performed at grade level.

Then, in 2008, the bar went up again and Atherton stumbled. In June, the school’s assistant principal for instruction, reviewing student answer sheets from the state tests, told her principal, “We cannot make A.Y.P.,” according to an affidavit the principal signed.

“We didn’t discuss it any further,” the principal, James L. Berry, told school district investigators. “We both understood what we meant.”

Pulling a pencil from a cup on the desk of Doretha Alexander, the assistant principal, Dr. Berry said to her, “I want you to call the answers to me,” according to an account Ms. Alexander gave to investigators.

The principal erased bubbles on the multiple-choice answer sheets and filled in the right answers.

Any celebrations over the results were short-lived. Suspicions were raised in December 2008 by The Atlanta Journal-Constitution, which noted that improvements on state tests at Atherton and a handful of other Georgia schools were so spectacular that they approached a statistical impossibility. The state conducted an analysis of the answer sheets and found “overwhelming evidence” of test tampering at Atherton.

Crawford Lewis, the district superintendent at the time, summoned Dr. Berry and Ms. Alexander to separate meetings. During four hours of questioning — “back and forth, back and forth, back and forth,” Dr. Lewis said — principal and assistant principal admitted to cheating.

“They both broke down” in tears, Dr. Lewis said.

Dr. Lewis said that Dr. Berry, whom he had appointed in 2005, had buckled under the pressure of making yearly progress goals. Dr. Berry was a former music teacher and leader of celebrated marching bands who, Dr, Lewis said, had transferred some of that spirit to passing the state tests in a district where schools hold pep rallies to get ready.

Dr. Berry, who declined interview requests, resigned and was arrested in June 2009 on charges of falsifying a state document. In December, he pleaded guilty and was sentenced to probation. The state suspended him from education for two years and Ms. Alexander for one year. (Dr. Lewis, who stepped down as superintendent, was indicted last month on unrelated charges stemming from an investigation into school construction, which he denied.)

Dr. Lewis called for refocusing education away from high-stakes testing because of the distorted incentives it introduces for teachers. “When you add in performance pay and your evaluation could possibly be predicated on how well your kids do testing-wise, it’s just an enormous amount of pressure,” he said.

“I don’t say there’s any excuse for doing what was done, but I believe this problem is going to intensify before it gets better.”

And to bring this all into perspective, Seth Godin ponders the economics of education in The coming melt-down in higher education (as seen by a marketer):

For 400 years, higher education in the US has been on a roll. From Harvard asking Galileo to be a guest professor in the 1600s to millions tuning in to watch a team of unpaid athletes play another team of unpaid athletes in some college sporting event, the amount of time and money and prestige in the college world has been climbing.

I'm afraid that's about to crash and burn. Here's how I'm looking at it.

1. Most colleges are organized to give an average education to average students.

Pick up any college brochure or catalog. Delete the brand names and the map. Can you tell which school it is? While there are outliers (like St. Johns, Deep Springs or Full Sail) most schools aren't really outliers. They are mass marketers.

Stop for a second and consider the impact of that choice. By emphasizing mass and sameness and rankings, colleges have changed their mission.

This works great in an industrial economy where we can't churn out standardized students fast enough and where the demand is huge because the premium earned by a college grad dwarfs the cost. But...

InflationTuitionMedicalGeneral1978to2008 2. College has gotten expensive far faster than wages have gone up.

As a result, there are millions of people in very serious debt, debt so big it might take decades to repay. Word gets around. Won't get fooled again...

This leads to a crop of potential college students that can (and will) no longer just blindly go to the 'best' school they get in to.

3. The definition of 'best' is under siege.

Why do colleges send millions (!) of undifferentiated pieces of junk mail to high school students now? We will waive the admission fee! We have a one page application! Apply! This is some of the most amateur and bland direct mail I've ever seen. Why do it?

Biggest reason: So the schools can reject more applicants. The more applicants they reject, the higher they rank in US News and other rankings. And thus the rush to game the rankings continues, which is a sign that the marketers in question (the colleges) are getting desperate for more than their fair share. Why bother making your education more useful if you can more easily make it appear to be more useful?

4. The correlation between a typical college degree and success is suspect.

College wasn't originally designed to merely be a continuation of high school (but with more binge drinking). In many places, though, that's what it has become. The data I'm seeing shows that a degree (from one of those famous schools, with or without a football team) doesn't translate into significantly better career opportunities, a better job or more happiness than a degree from a cheaper institution.

5. Accreditation isn't the solution, it's the problem.

A lot of these ills are the result of uniform accreditation programs that have pushed high-cost, low-reward policies on institutions and rewarded schools that churn out young wanna-be professors instead of experiences that turn out leaders and problem-solvers.

All of this spells an unhappy ending for a system that is already bankrupting the country.

No comments:

Cartoons (click to site of ownership):