Measurement Error?

Some of the most miserable experiences that I’ve ever had with staff development have been connected to reviews of standardized test results.  One session in particular has been burned in my mind because the presenter—who I happened to admire and enjoy—spent the better part of the session battering my sixth grade language arts teachers and me with discouraging numbers that seemed to reveal that we were the least effective teachers in the universe. 

In fact, he described us in front of our entire faculty as "decidely average" because our students were showing the least amount of growth on the state’s exam when compared to the other grade levels and departments in our school.  "You’ll notice," he said, "that you’ve only slightly exceeded the 4 point average growth that middle grades students typically post on the reading portion of the state’s end of grade exams."

Fighting back a bit, one of my peers raised her hand and asked about the measurement error—a statistical indicator expressing how accurate a test is at predicting actual student achievement—on the exam.  She was expressing a doubt that has crept its way into the minds of many educators.  "If we’re going to use these scores as an indicator of our performance, then we need to have confidence that they are reliable measures of student performance," she explained. 

After an awkward pause, the presenter explained that the measurement error on the exam stood at 3 points!

We were shocked and a bit offended all at once.  After all, how seriously could we possibly take the results of our standardized exams when average student growth statewide just barely exceeded the measurement error of the exam.  "Does that mean that when students make average growth that they may—or may not—have mastered a year’s worth of content?" my friend asked, "Is it possible that some of the kids who failed the exam actually could have made expected growth?" 

"Well, sort of," replied the presenter.  "But you also have to remember that some of the kids who passed the exam actually didn’t make expected growth too."

Needless to say, we didn’t take any comfort from his explanation!  After all, these scores had just been used as a cudgel to criticize our performance. "If we’re going to get bruised over these scores," my colleague muttered, "then I’d like it to be by evidence that is a bit more definitive than a strong maybe!"

We were all left to wonder what kinds of implications our newest discovery should have on teaching and learning—and student assessment—at the school, district and state level?  Should my team truly embrace test results as an accurate indicator of performance—and then use those scores to make instructional decisions—-when a year’s growth is just slightly higher than measurement error?  Should we accept accountability for results on an exam that seems barely reliable?

I’m honestly confused because I’ve been trying to support standardized testing as a tool for change….but now I’m not so sure it’s even something I should consider. "The test" has lost even more credibility in my book!

10 thoughts on “Measurement Error?

  1. Roger Sweeny

    “We know that a student who reads on the college level can simply have a bad day, simply choose to fool around on the test, or the test itself might be flawed.”
    If I student “chooses to fool around” on a “pass-or-you-don’t-graduate” test, I have no problem with not graduating him. I would also think seriously of nominating him for a Darwin Award.
    If the test is flawed enough to say that someone reading at a college level actually reads at a level of the 7th month of fifth grade, then of course no one should use it.
    If someone has a bad day and is scored as reading at a 12th grade level rather than a 13th, I don’t see that as a problem. No one with any sense expects pinpoint accuracy in a test.
    If someone has to pass a test at eight grade reading level to get a twelth grade diploma, I don’t have any problem if a bad day causes that person to be scored at a seventh grade level rather than the eighth grade level that would be accurate. I have no problem denying that person a high school diploma
    We all know that in the good old days, pre-high stakes tests, teachers were often expected to pass students as long as they had done most of the work and attended most of the days.
    That is a major reason high school diplomas have so little status today and why a friend of a friend once said, “I thought my diploma from ________ High School meant something, until my brother got one. Now I know it means nothing”

  2. Mike

    One of the primary implications inherent in the arguments of those who support high stakes, mandatory testing is that absent such tests, there is no way to measure student or school performance. It is implied because even the most rabid advocate of this sort of testing tends to shy away from making this point explicitly; it’s a bit too much even for most true believers, but perhaps someone posting here will surprise me.
    Jake asserted that these tests solve the problem of being able to compare students, teachers and schools. Sorry, but as a working teacher, I don’t see this as a problem at all. What does it matter to me how a teacher in a school in another part of my community or even of my state is performing? That’s up to their school district and it’s supervisors to determine, and ultimately, it’s up to the voters in that school and mine. And while being able to compare data might make state and federal educrats tingle with delight, it does nothing at all to help me teach more effectively to know that my school scored 1.249% higher in measure A and 3.281% lower in measure B than school Q or state P.
    And Roger’s suggestion that a single test was useful in gauging reading ability is partially correct, but only in the narrow sense that it can be useful as only a small part of a much larger picture of a given student’s abilities and performance. We know that a student who reads on the college level can simply have a bad day, simply choose to fool around on the test, or the test itself might be flawed. Those who would argue that such is unlikely, might want to recall something or other about the SAT’s or a similar test a few years back screwing up thousands of kids trying to enter college.
    The point is that by far the assessment method that is most reliable, least prone to error, and that offers the greatest degree of actual accountability is the classroom teacher. I speak of the teacher who, over the course of a year, deals with each and every student daily, reads hundreds of pages of their student’s work on a wide variety of assignments, and who works intimately with each student, knowing them and everything that would be likely to have an effect on their school performance.
    Want to save tens of millions of dollars wasted annually (it’s about 35 million in Texas, for example) on state level testing? Ask the teachers how their kids are doing. That won’t cost much and it will be much, much more accurate and reliable, to say nothing of saving the taxpayers a huge amount of money.
    So, perhaps another question is in order: Which is the best source of information about a given student: The teacher, or a single score on a high stakes test?

  3. Roger Sweeny

    “how many of you believe that a single test given on a single day, no matter how brilliantly and perfectly produced, can reveal anything definitive and meaningful about your intellect, ability and progress?”
    Me, me, I do!
    If one of my students takes a test and it says he reads at a grade level of 5 years and 7 months, it tells me something very important. It tells me he doesn’t belong in ninth grade.
    Does he really read at 5-7? Probably not. But I can be pretty certain that his reading level is at least 4th grade but not more than 7th.
    High stakes can’t-get-a-diploma-without-them tests are similar. Since passing them requires about an 8th grade level of achievement, I know that anyone who can’t pass does NOT deserve a high school diploma.

  4. Parry

    As I understand it, the results for any one student on a standardized test are considered by testing experts to be very weak measures. The standard error of measurement for an individual student is so high that it is next to impossible to make valid inferences about what that one student did or did not learn over the course of a year, based on a single test.
    As you move toward a class average, you start to get on more solid ground because the positive and negative swings begin to average out. Nevertheless, it is again problematic to make valid inferences about the quality of teaching and learning in a classroom based on one year’s worth of tests. This year’s low-performing teacher (as measured by standardized test scores) could very well be next year’s high flyer. You would want to see consistent patterns over time before jumping to conclusions; for example, if your scores are low year-in and year-out, and other teachers at the same grade level are seeing consistently higher scores, then you may have something.
    Moving up to the level of a complete grade, you could expect to make more valid inferences, but there are still problems. Have the 6th grade scores been the lowest in the school for the last three or four years, or was it just this one year that they were “decidedly average”? It is also difficult to compare grades, because you’re talking about apples to oranges comparisons. If the 7th grade scores were particularly high, were those students’ scores also high in the 6th grade? That could indicate an exceptionally high-achieving class of students.
    Basically, the rule of thumb I have been taught is that, the more students you’re talking about, and the more year’s worth of data you’re talking about, the more valid your inferences can be. One year’s worth of data, or just one class’s worth of data, really don’t allow for much inferential extrapolation.

  5. Patty

    We so misuse statistical information in this country and educators are some of the worst. At most, 1 shot test data should give us a chance to ask questions. So often the first reaction to test scores is to blame (or praise) the teachers. Sometimes these scores can tell us things about programs we use to teach or about the needs of a particular group of students. Although these scores should only be used as “one piece” of information, it is the piece that gets published in the newspaper. We could require every teacher to take a statistics course but we can’t require the newspaper readers to do so. Thanks for your blog! I’m going to find that Perlstein book mentioned in one of the comments.

  6. Jack Phelps

    Bill–I’m currently reading through “Tested” by Linda Perlstein, which has got to be the best book written on the effects (positive and negative) of increased standardized testing and testing-driven goals on elementary schools. I can’t recommend it enough as a way to put data + anecdotes (which I’m sure you’ve already got) in your hands to successfully debate any aspect of the testing argument you want with any expert who comes into your school.

  7. Jake Savage

    Answer to Mike’s final question:
    Making students take a standardized test solves the problem of not being able to compare students across classrooms, teachers, and schools. Making students take any test (as I’m sure you do several times a year) solves the problem of not being able to read their minds to find out what they know and can do. I’m sure that most teachers use a single test on a single day to figure out what their students learned in a particular unit, so why wouldn’t the same principle apply for standardized tests?

  8. Mike

    Dear Bill:
    Try this simple thought experiment the next time a “presenter” presents such tripe. Merely ask “how many of you believe that a single test given on a single day, no matter how brilliantly and perfectly produced, can reveal anything definitive and meaningful about your intellect, ability and progress?”
    Yes, a few hands will go up, for they are the true believers, the “test truthers” if you will. Then ask: “How many of you believe it is rational to allow that single test score to determine the future course of your life?”
    If any hands remain up after that question, ask of those whose hands are not up: “How many of you still think mandatory, high stakes tests are a good idea?”
    These tests are a political “solution,” mandated by politicians using a business model. As such, they have little or nothing to do with education. They are designed to provide the illusion of action to the public. They allow politicians to pound the table top and sputter that by God, they are doing something about education! Why, they’ve introduced accountability, something that was never, ever a part of education before their brilliant testing innovation. Yeah. Sure.
    “But,” the testing truthers will say, “they are forcing bad schools to do something.” Yes. They’re forcing them to do what politicians do: give the appearance of action (What? We didn’t know the school was bad without the test?).
    Final question for the group: “What legitimate problem that we have in education can be fixed by making our students take a test?” Be prepared for a lengthy wait for an answer and for some of the lamest answers you’ve ever heard (if any), even in an inservice.

Comments are closed.