The topic of “testing” gets a very passionate response from educators (and parents), and not usually a very good one. But ask them about assessments, and you’re likely to get a very different response. It may be a matter of semantics, but the underlying cause is not something to consider lightly.
No good teacher will say assessments are unnecessary. They inform our teaching, highlight misconceptions (both those of the students and those of the teacher), and help us diversify instruction to best meet the needs of individual learners. Assessments are needed for effective education.
But in the current landscape, “testing” seems to have taken on a life of its own. As a recent study highlighted in this article in the Washington Post shows, the use of mandated, standardized tests has proliferated at an astonishing rate over the past 10-15 years. While we may recall yearly standardized tests from our own childhoods, and other countries who are outperforming our students on various measures report three over their entire school experience, this recent study shows typical students in the US will take 112 standardized mandated tests between preK and 12th grade. That’s not counting the typical quizzes and tests administered by teachers independently to guide their instruction.
Michael Casserly, executive director of the Council of the Great City Schools, the organization responsible for the study, describes a dog-pile scenario where agency after agency has loaded on redundant and ineffective tests, saying, “You’ve got multiple actors requiring, urging and encouraging a variety of tests for very different reasons that don’t necessarily add up to a clear picture of how our kids are doing. The result is an assessment system that’s not very intelligent and not coherent.”
Case in point: Last school year, Florida kindergarten teacher Susan Bowles outlined the many different layers of tests and assessments she administers to her 5 year old students in this letter to parents explaining why she would be refusing to administer one of those tests. Amid beginning of the year assessments, curriculum benchmark tests, anecdotal records, and end of course tests (an abbreviated load that already sounds like too much) she explained that the state mandated test was taking one to two weeks of instructional time from her students — three times a year. On top of the time-consuming nature, the computer-based test would yield unreliable data after 5 and 6 year olds struggled to use the mouse properly, frequently skipping multiple questions inadvertently.
(Incidentally, her letter, written in anticipation of possibly being fired for noncompliance, sparked a great deal of attention, which actually led to the state pulling the test from grades K-2. Susan Bowles was not fired, but was instead honored by her county as “Teacher of the Year” a few months later. Not all teachers are in the same position. For guidance on dealing with the ethical issue of testing, consider these points from the NAEYC.)
In Chapter 19 of our read along book, ,What If Everybody Understood Child Development?: Straight Talk About Bettering Education and Children’s Lives (affiliate link), Rae Pica points out that a study done by the Brookings Institute in 2012 showed that $1.7 billions dollars were spent annually at that time for standardized tests in public schools. As large as that number is, it doesn’t even begin to cover the full cost of over-testing, particularly in our early grades.
We have to consider all of the costs. First,the time. Time to take the tests and time preparing for the tests, of course. But also the time it takes to teach a room full of students how to log onto computers with their personal information or to teach kindergarteners how to use a mouse. Tests may be slotted for 30 minutes, but when a child’s developmental reality clashes with a rigid test’s assumptions, tests can stretch to two or three times that. We have to consider the time (days, even weeks) spent with a substitute teacher treading water while the classroom teacher works as a proctor in the hallway instead of guiding real learning in the classroom. The allotment of time in some cases appears to support the sentiment from the Center for American Progress that some states and districts highlight testing above learning, as noted on pg 90 of the read along.
We also have to consider the “soft” costs. Things you can’t always measure in dollars or minutes or percentage points. The stress, the anxiety, the attitudes and approaches toward school and learning. The tears shed in frustration when a kindergartener is overloaded with adult expectations. (The tears shed by teachers who just want to teach according to their professional conscience and ability.) The subtle (and not-so-subtle) shifts in programs, curricula, materials, classrooms, and activities because “we can’t do that and still be ready for the tests”. The shift from a responsive teacher guiding authentic learning, to a packaged curriculum guiding the teacher toward test prep. The reduced emphasis on creativity, hands-on experiences, and discussing ideas, and an increased emphasis on one right answer to be clicked or correct bubble to be filled.
So where is the line? When do we shift from appropriate assessments to over-testing?
Here’s what I believe we need to consider.
As new assessments are introduced, we must ask, can young children respond to this test in a reliable manner? If not, what’s the point? When we talk about tests and data and statistics, we have to know that the test is reliable — that the same subjects would produce the same results if they were to take the same test twice. This is one reason that standardized testing used to be rare in K-2 grades. Because of developmental constraints, children in grades K-2 notoriously do not provide reliable results on standard paper-pencil (or computer) tests.
Does this test what we think it tests? I’ve read plenty of poorly written questions, worded in such a way as to test a child’s vocabulary or reading ability rather than their conceptual understanding. Rae shares this link to an award-winning principal’s letter outline concerns over the wording and conceptual framework of a test given to first graders.
Is a test assessing ability level or simply measuring test familiarity? Is a test measuring student understanding of valuable math concepts or simply measuring a student’s fluency with the specific verbiage and packaging of math concepts unique to the company that created both the curriculum and the test?
Not all assessments are of equal quality. Not all measure what they portend to measure.
As we talk about validity, and whether tests are measuring what we assume they measure, we also have to look at the appropriateness of that starting point as well. Here’s what I mean. A test can be very valid, in that it accurately tests what it sets out to test. But we still must assert whether we should be testing that in the first place. Is the standard itself valid?
For example, in the article shared above about an ill-fit test for first graders, Carol Burris takes issue not only with the confusing presentation of the questions, but also with the standards from which they were created. As she pointed out, regarding standards that were “back mapped” from graduation end goals to kindergarten expectations (rather than beginning with what we know children need in the early years and working forward): “There is no evidence that early childhood experts were consulted to ensure that the standards were appropriate for young learners. Every parent knows that their kids do not develop according to a “back map”—young children develop through a complex interaction of biology and experience that is unique to the child and which cannot be rushed.”
Is the test valid in that the standards it is measuring are also valid and appropriate expectations?
Often, tests show us a score of correct answers as the result. But if you subscribe to the pedagogical stylings of theorists like Piaget, Vygotsky, and Montessori, you know that much can be learned from how a child answers incorrectly. As I look over my own fifth grader’s math papers, I can see from his work whether his mistake is due to a calculation error, mistaking perimeter for area, or not understanding the concept at all. Once I recognize that, we can go over the problem again (which he loves, of course *sarcasm*). But if I simply look at the score at the top as the results it’s a label, not a guide.
This was exactly the case with another of my boys. His teacher had pointed out to me that while his grade in a particular subject was excellent, his score on the computer-based test had been declining with each benchmark. His proficiency and performance in class was outstanding but his test scores were getting worse instead of better. We both scratched our heads. Was it a testing issue? Was he running out of time? Misunderstanding the question? Struggling with vocabulary? Stumbling through comprehension? We won’t know until one of us shadows him through the test because all we are given from the test is a score and a range (below level, on level, etc.). That assessment, while possibly valuable in other ways, provides incomplete information when it comes to actually guiding instruction.
Do the results guide and inform instruction or simply label children, teachers, and schools?
Is the test appropriate? For standardized tests, you have to control for variables. Have you ever tried to control all the variables in a kindergarten classroom? Time limits, mouse-click accuracy, stray pencil marks, restrictions on when and how long you can take breaks, and keeping everyone on task when minutes drag on beyond an hour are excruciating challenges when working with young learners. We have to assess whether the structure of the test is appropriate for the developmental age and stage of its intended students.
How often are tests being administered? Are there redundancies as agencies overlap? How much learning time are we exchanging for testing time (and test prep, and teacher training, etc. )? When we look at the high frequency of mandated standardized tests, and consider what learning experiences they have displaced, it’s strange to consider that anyone would expect to see a marked improvement in student learning. The investment has been put into testing, not into learning.
Why this test? Are we assessing progress or just testing children and our assumptions about them? Are we checking learning or checking a box? Are we testing to guide instruction or testing to prepare students for future testing? Particularly when challenging the purpose of standardized testing in kindergarten, I’ve heard the rationale that children need to test in K-2 so that they will be prepared for the high-stakes testing in third grade. If three years of test-taking is what prepares them best, we have to ask ourselves if the tests in third grade are actually reliable and valid.
That’s certainly a lot to consider when deciding whether or not our assessments are appropriate. And I’m sure it isn’t even an exhaustive list. Considering the resources — time, money, attention, effort, energy — put into testing, I would say we owe at least this much due diligence.
I can never assert strongly enough that I am absolutely in favor of using assessments as tools for serving children. But we have to be more discerning in which and how many we use and why and how we use them.
Prescribing testing to improve learning is like prescribing a blood test to improve anemia. The test is valuable, but it doesn’t improve the condition. It only provides the knowledge necessary for taking action.
It seems that in an effort to correct the anemic outcomes in some of our schools, we have simply provided more tests. Imagine if the same time, money, attention, effort, and energy could be channeled into providing the supplements that actually create the needed change.
What are your perspectives and observations when it comes to our age of testing?
And as always, share your questions for the author, Rae Pica. She’ll be answering YOUR questions in the last post in the series!