“Will this be for a grade?”
That was a student’s response when I told my Algebra class about an upcoming test. My first reaction was to be flabbergasted. “What do you mean is it for a grade? It’s a test. Of course it’s for a grade!” But then suddenly I understood.
It was October. My students had recently taken six SLO pretests and two 90 day plan baseline assessments on top of three MAP test assessments. Eleven tests. None of which had counted for a grade.
So it actually made a lot of sense that they were asking the question. But, yes, this one counted for a grade. This one was made by me to assess students’ progress on the content that I had taught them. It was a test. The same kind of test that teachers have been giving for centuries.
The eleven other tests were different. They were assessments given to gather starting-point data. The MAP assessments are nationally normed tests in reading, math, and science. They indicate where students compare relative to same-grade peers around the country. The initial SLO (student learning objective) and 90 Day Plan assessments collect baseline data prior to instruction in order to be able to measure students’ learning by administering a very similar assessment at the end of the unit, and then comparing the two scores. The pre-tests aren’t for a grade because the teaching hasn’t yet occurred, so it is unfair to hold students accountable for information they haven’t been taught.
Data is king in education right now. “Where’s your data?” is a frequently heard challenge to any stated claim. This can feel very frustrating because as highly-educated and experienced professionals who spend countless hours with students every week, teachers have huge amounts of anecdotal evidence that inform decisions and practices, sometimes in the absence of “hard” data. So the question, “Where’s your data?” can feel insulting.
But although there are, indeed, four letters in data, I no longer believe that “data is just a four-letter word.”
In fact, over the course of the past several years, I have shifted from being a huge data naysayer to likely being the biggest “data nerd” in the building. When there are numbers to be crunched, results to be analyzed, or reports to be written, I invariably find myself raising my hand. I’m not entirely sure why I do this. I don’t enjoy it, exactly; it’s more like I’m intrigued and challenged.
Data Collection Versus Research
Data – What does it say? What does it mean? What does it tell us?
And equally – What doesn’t it tell us? What does it leave out? What are its limitations? How might it mislead us?
There is so much data: trend data, cohort data, individual data. Teachers are drowning in data.
Data does tell us something, but it certainly doesn’t tell us everything, and sometimes it isn’t telling us much of anything at all.
Let’s face it, most of the data collected in educational settings is an example of short-term, small-sample action research that is conducted with many variables at play and in the absence of a control group.
This has its place, but in this type of research, we can’t ever rule out that the results are due to a specific group of kids, or a specific version of the test, or a specific test administration – e.g. the wifi wasn’t working properly, it was the week before spring break, or a bird flew in the window. (I swear that actually happened.)
In education, we don’t have the time or resources to conduct long-term research studies across multiple populations with equivalent control groups. Things change too quickly, and besides, the children are growing up. We need to meet their needs in real time.
But educators (and lawmakers) need to be very careful about implying that short-term data collection has the same level of merit as research studies. It does not. That doesn’t mean it has no merit at all, but it means that it must come with a lot of caveats.
Part of the Data Story
The collection of data can provide valuable information. For example, when, in response to uncomfortably high retention rates, and after exploring the research on grading practices, my team began experimenting with atypical grading policies, we wanted to examine whether our shifts were having an impact. To do this, we collected and analyzed quarter grades for all students.
At the conclusion of the first year of this work, there was almost no change in our retention rate. In addition, the lack of flexibility and punitive tone of the new grading structure had created such discord with parents and students that it wreaked havoc on our ability as teachers to build positive relationships with families and with adolescents. That was a huge cost; one we weren’t willing to repeat, so, in response, we made shifts to our policy.
When we were still unhappy with the results after the second year – again, no change to the retention rate — we adjusted further. It wasn’t until the third year that we saw a significant decline (nearly 50%) in our retention rates.
Was this eventual success due to the changes in our policy, was it due to that particular group of students, or was it due to something else entirely? There was no way to know for sure, but for the first time in years, we kept our grading policy exactly the same.
Each of these decisions had been made by examining our data, and making decisions accordingly.
These weren’t simple decisions. The issues were complex, for while our very stringent and rigorous policy didn’t shift our retention rates, it had yielded a significantly greater number of students with As and Bs. Essentially our grading structure widened the gap between successful and unsuccessful students as some students rose to the challenge and excelled. As we made changes to our policy based on our concerns, we also felt compelled to maintain the components that had supported these students in their improvement.
The student groups that experienced the first years of our experimentation are now in high school. Despite the fact that they have not been subject to our grading policy for several years, their success has been sustained over time. Honor roll data collected over multiple years has revealed that, as a group, these cohorts of students continue to academically outperform their peers. Although Jack likes to present this data as evidence of the policy’s success, I don’t think it is as clean and clear as that because there is yet another piece of related data that concerns me.
There’s More to the Story
The students from those same over-performing cohorts also had the lowest matriculation rates to our high school program. After middle school, they were leaving Gamble to go to other schools at a higher rate than students from other classrooms in our program. There wasn’t a huge gap in the numbers, but it was a gap nonetheless. Could this, too, have been due to our grading policy? Of course it could have been.
We knew that the relational costs of our policy those first two years had been problematic. Could this stress have caused us to lose students enrolled in our program? I think that it might have. We certainly had families and students who strongly expressed their concerns, and a number of these families ultimately chose to send their children elsewhere.
So what does all this grading data mean? Does it mean that our initial grading policy led to long-term positive academic performance? Maybe. Does it mean that it also caused us to lose students? Maybe. Does it mean that the students in those particular classes were more academically skilled to begin with? Maybe. Does it mean that there is something intriguing about those groups that is potentially somehow related to our policy? Yes.
All of this grading information taken together is certainly interesting, even though it is not conclusive. It’s definitely something to take into consideration when determining policy and instruction, but it’s not something that, in isolation, should drive those decisions. It is also important to remember that a classroom is never identical from one year to the next. In the laboratory, we understand that conducting the same experiment in a slightly different petri dish and with a totally different group of microbes will never yield identical results. This is equally true for the classroom.
In this way, data can also be distracting … and perhaps even damaging.
The eleven baseline assessments that our students took at the start of this school year are a case in point. Eleven bells, or more than two days’ worth of core content instruction, lost to testing already this year, and what did it tell us? It told us that for the most part our students are performing below grade level. We knew that already. We knew that because we observe them every day, but now it’s official because we have “data,” … and eleven bells worth of lost instructional time which could have been used to work toward catching them up to grade level.
The Dark Side of the Story
Data has a dark side beyond the sheer amount of time lost to collecting it. The entire purpose of data is to compare – students to students and teachers to teachers. However, neither students nor teachers exist in a controlled petri dish environment where such pure comparisons can be made.
This is where data can become a damaging element. When students become data points, we run the risk of dehumanizing education, of forgetting to see the whole child, of failing to look at strengths and gifts beyond academic success, and of losing sight of the importance of growth of the individual. I’ve previously written about those concerns here.
When teachers are pitted against each other, we experience some of the same concerns. This year I heard a teacher state, “Tomorrow we have that meeting where we get to hear how much better this teacher is than the rest of us.” This same meeting caused another teacher to share that he was struggling with feelings of competition and of not being recognized for the hard work he was doing due to what felt like an increased focus on test results.
These teachers were working hard and were seeing gains with their students, but their professional interpretations were being overshadowed by King Data.
In fact, even the teachers whose test scores were higher than average felt awful at the end of that data meeting. They were asked to define what they had done that had impacted the testing results. On the surface, this seems like such a great question!
Wouldn’t it be great if there was a clear answer? Then those practices could be shared and implemented more broadly. That, of course, was the entire purpose of the discussion. But that intention assumes so many things.
It assumes that the test is a valid measure of student progress. This particular data had come from the AIR test results (Ohio’s required state-testing.) None of us sitting at that table was convinced that the AIR tests are a fair or valid measure of our students’ progress.
It assumes that the test questions reflect instruction …. or perhaps that instruction reflects the test questions. As none of us was invested in “test prep” strategies of teaching, we couldn’t say with certainty that our instruction and the test were clearly in alignment.
It assumes that, even if both of the above issues are true, we could tease out which things — which specific practices, or units, or lessons — led to greater testing success.
The more we attempted to engage in this conversation, the more confusing it became.
We shared some of our practices, but this was followed almost immediately by recantations that sounded like, “But I don’t think that’s even measured on the test.” “But I don’t know if that is helpful for preparing students for a timed test using cold readings.” “But I can’t imagine that this single lesson made the difference.”
We walked away from that meeting feeling divided, uncomfortable, and unacknowledged. Why?
Because data is only a tiny sliver of a much larger picture of students and teachers.
Edwin Friedman talks about the dehumanizing effects of the “de-differentiation” of data. The idea that if something appears true for a group, that it is therefore true for the individuals involved. He writes, “The focus on data to the exclusion of emotional variables leaves the patient [or student] to hope that he or she falls into the right category. This atmosphere not only turns patients [or students] into statistics; ultimately it turns them into data.”
He notes that furthermore, “the data themselves are formatted in anxiety-provoking formulas that, precisely because they leave out emotional variables, give a deterministic impression.”
I believe that was why we all left that meeting feeling so awful. We were no longer talking about students, or about teachers, or about the critical relationship between students and teachers. We weren’t even really talking conclusively about instruction. We were talking about data, and in doing so, we lost sight of the bigger picture, and we fell prey to believing that the data told us something definitive.
The Whole Story
And that’s the real lesson here. We are led to believe that so-called “hard data” – data by the numbers – is irrefutable. That it is clear and clean, and that it tells us something absolute. But it does not. Quantitative data is equally as messy and subject to interpretation as qualitative data – that anecdotal evidence which has been so disparagingly deemed “soft data.”
Does that mean that we shouldn’t discuss data? No, that is certainly the wrong way. After all, we spend a lot of time collecting data, that time would be entirely wasted if we didn’t look at the outcomes. And these results do provide us with information … sometimes important information. However, we need to be careful as we engage in these conversations.
Here are some guidelines to help keep data in perspective:
- Consider the validity of the measurement tools
- Consider the value of the measurement tools
- Work to establish data collection procedures that support learning, rather than detract from it
- Share multiple forms of data – this may include test data, but should also take into account classroom-based assessments, work sample analysis, and observational, or anecdotal, records
- Actively look for potential flaws in the data, alternate interpretations, and hidden side effects
- Openly share these caveats
- Never forget that any single piece of data provides only a tiny sliver of information about what is happening in the classroom
The knowledge a teacher has about individual students is based on thousands of instructional moments (We could call them “data points.”) that occur each school year. This information, taken together, is a far more comprehensive understanding of student growth than any series of raw numbers could ever be. It is critically important that we not lose sight of that in the deluge of quantitative data that is currently flooding our educational system.
But do look at the data. Talk about it. Consider what it means. Think of it as a pointillist painting. The dots, the data points – quantitative and qualitative alike, fit together to help you to see the child as a whole, but don’t allow there to be so many dots, so many data points, or so much irrelevant information that the child becomes obscured.
So proceed with caution. Consider the merits of the data and choose wisely. Data does tell us something, but it doesn’t tell us everything. Like any pointillist image, you can only see the whole picture by looking, with perspective, at the sum of the parts.
 Friedman, Edwin H., Margaret M. Treadwell, and Edward W. Beal.A Failure of Nerve: Leadership in the Age of the Quick Fix. New York: Church Publishing, 2017, p.103.
 Friedman, Edwin H., Margaret M. Treadwell, and Edward W. Beal. A Failure of Nerve: Leadership in the Age of the Quick Fix. New York: Church Publishing, 2017, p.104.