Read This: One Test to Rule Them All

"So what do I take from all this? I used to spend hours creating and revising the perfect assessment. I'd stress about word choice.

I'd wonder if I was giving too much away or not enough info. I'd try for that perfect balance of academic vocab and accessible language.  I'd try to cram in 60 questions in 50 minutes or have them write a full lab report in complete silence.

Except that there aren't any perfect assessments. What's perfect for Student A is highly flawed for Student B. It does not exist. But there are certainly better and worse assessments. 

So like the Apgar test, I try to make my assessments good enough. Multiple "pretty good" assessments give a more complete picture than any single "great" one possibly can.1 And NEVER EVER let any one score dictate everything."



I came across this great Jason Buell bit the other day on our obsession with formative assessments.

In it, he describes the disastrous results of doctors who get lost in the "tyranny" of the Apgar score. Collecting and comparing data on the appearance, pulse, grimace and activity levels of newborns has–in many cases and for many doctors—become more important than caring for and observing babies.

That resonates, doesn't it?

Don't teachers get wrapped up in results, too? Aren't we just as likely to lose focus when our assessments become more important than the kids that we're teaching?

The question, then, is what are YOU—whether you're a parent, a policymaker, a principal or a teacher—doing to make sure that you don't fall victim to "the tyranny of the score" in your schools?

2 thoughts on “Read This: One Test to Rule Them All

  1. Gerry

    Hi Bill – great reflection, and one that really points to the need to understand just what test scores mean.
    If I may, I’d like to point out a couple of my personal peeves on test-building, and ask for your thoughts on the question I leave at the end:
    First, assessment quality is a baseline kind-of-thing; as pointed out, the tests, the questions they’re made of, and the distribution of questions/topic need to be ‘good enough’ to be reliably telling you (and the learner) something that you might laughably put into the cause-and-effect pile. (We did ‘THIS’, and ‘THAT’ happened).
    This is far from a sure thing… if your goal was to help the kids learn to reason, you need to ask a question that gets at their ability to do that, not one designed to examine retention or knowledge. (You also need to have concentrated on teaching reasoning, not just retention, something that sometimes ALSO isn’t a sure thing in our busy lives). If both of those distinctions are true, then your properly-designed and targeted ( ‘good enough) question will give both you and the kids information that you can use to improve.
    Secondly, you need to be able to find that data; in other words, you need to be able to discriminate. A great piece of data buried in a composite mark (75%) isn’t going to help anybody at all. We need to design assessments so that all of the questions on ‘Flight’ (grade 6 science 🙂 can be disaggregated from the other general science questions, and then further broken down into levels of cognition and specific understandings.
    Here’s the issue: most people don’t design their assessments so that they can use them to answer the questions they have, and really, if the assessment cannot provide you with answers to YOUR questions, it surely won’t help the kids with THEIR questions.
    So here’s the set-up for the question: We can both agree that a 75% ‘Class-average’ is pretty useless for figuring out how to improve. Slightly less useless is little Randy’s 68% composite mark. We might be able to drill deeper and tell that Randy did OK on the selected-response part and poorly on the long-answer question, or that he labeled the diagram correctly but only got half of the definitions… but that might tell us more about the time limits of the exam than about Randy.
    The important questions we might be asking look like:
    –“What specific understandings does Randy appear competent with”?
    –“Which specific skills has Randy developed and shown some mastery of”?
    –“have our shared instructional experiences achieved their aims, or should I consider re-reaching Randy some things?
    –“Have I found insights which might help me improve my instructional practices”?
    (The first two questions are for Randy to reflect on, while the second two are for us to consider. There are obviously more, but I think this gets the point across…)
    So (finally!) here’s the question for you (and all of us, really): despite all of our education, do we routinely create assessments that help us improve (by giving us the answers to the questions above), or do we create assessments that will help us calculate and measure kids and teachers??
    The answer might tell us more about what WE think is the goal of education than we like to admit.
    Funny how we tend to push back against outside influences that try to do to us what we routinely do to kids…


    It is easy to really focus on the scores. I think that this comes with the importance of test score data. Hey, school ratings and funding make them important, if not in the right way.
    I hadn’t thought of formative assessments as overbearing, because we’ve always used formative assessment. I have however, thought that the threat of being wrapped up in score data came from the emphasis placed on summative assessments.
    So how do we avoid becoming wrapped up in the score? What I do is take a step back and remember why we’re in education and remember that kids are kids and they need to do well, but I can’t lose the forrest for the “formative” trees.

Comments are closed.