Assess Your Grading and Assessment Practices

This blog post appeared originally at The Standard, the group blog of the National Board for Professional Teaching Standards (7/5/16).


Being a National Board Certified Teacher (NBCT) is a source of pride for me, providing both a sense of professional accomplishment and sense of professional companionship with leaders in my field. The certification process provides us with a shared set of concepts and terms we can use to guide our ongoing learning and the improvement of our practices.

DSC_0475Sometimes, the quest for improvement keeps us in comfortable territory, eager to try new materials and lessons that fit certain preconceptions about what we do. Sometimes, however, we back away from more challenging but necessary conversations about shifting paradigms. In this three-part blog series for The Standard, I’m asking NBCTs and teachers in general if we can’t find the will to push ourselves further in these challenging areas. In my last post, I suggested that technology has changed the way all of us think about information and communication, and that even if some of those changes have negative potential, teachers must adapt to the changes in society rather than use the negative potential as an excuse to resist change.

In this blog post, I’m stepping into debates that inspire strong feelings all around, and often divide teachers. If we are truly accomplished teachers who value analysis and reflection for the improvement of teaching, it’s time to accelerate changes in our assessment and grading practices. At the end of this piece, I’ll suggest ways to extend the learning and the dialogue around these issues.

 

Let’s start with homework

Or rather, let’s stop giving so much homework. We all know, as parents if not as teachers, that there’s a lot of busy work out there, packets and worksheets that aren’t really accomplishing much. Alfie Kohn, a well-known proponent of abolishing most homework, has suggested that we should think of “no homework” as the default position, and then expect a clear rationale for homework if assigned. I agree with that position, though I won’t go so far as to say no homework, period. In addition to the questionable academic benefits, I think there are important considerations summed up in this quote from Robert Marzano and Debra Pickering (“The Case For and Against Homework”), offering their take on key findings in Kralovec and Buell (2000):

[Homework] overvalues work to the detriment of personal and familial well-being. The authors focused particularly on the harm to economically disadvantaged students, who are unintentionally penalized because their environments often make it almost impossible to complete assignments at home.

I have seen plenty of examples of unnecessary and excessive homework, and admit that earlier in my career, I assigned some of that unnecessary homework; I reasoned that a steady amount of homework would create a level expectation and prevent students from seeing the valuable homework as something above the accustomed baseline. Now, I would prefer to engage my students and their families in some discussion about homework, so that they understand why it’s sometimes necessary, and why there may be periods of time where the load vacillates. As a high school English teacher, I still believe that students need to do course reading outside of class time. In a sense, it’s the original “flipped instruction” – take in the content between classes, then come to class to practice using the content. Beyond reading and writing, I’ve eliminated almost all other homework from my teaching. In any given subject area, teachers in secondary grades should be well-versed in the arguments against homework if they’re going to justify giving any.

 

Assess your assessment practices

We need to have more conversations about how we assess student learning as well. Too many teachers are still grading work and moving on, without giving students chances to develop mastery when the graded work indicates the need for more practice. We short-circuit this conversation sometimes by defaulting to the ideas of formative and summative assessment, without teasing apart our assumptions about summative assessment and grading. As my friend Jason Buell suggested in his former blog title, our teaching should be “Always Formative.”

I understand there are practical limits to this idea, that depending on the grade level and subject there’s a time to move on to new material. I also understand that revisions and multiple chances present logistical challenges for the teacher work-flow; that’s an argument for changing the work-flow. If we’re analytical and reflective about doing what’s best for student learning, we need to consider how we can change our way of working to accommodate students’ learning needs. Here’s one quick thought: why do we have to have every student complete the same assignments, or same number of assignments? If some of my students demonstrate mastery of oral presentations early on, maybe they should show their content knowledge in another format next time, while students who would benefit from another oral presentation experience prepare for that.

 

Antiquated grading practices detract from learning

There are some common grading practices that are among the most entrenched and poorly conceived things that we do in education. Again, I understand it may be challenging to change. The obstacles are real. Settling for inferior, even damaging practices, because change is difficult, should be inexcusable among accomplished professionals. If you’re averaging all student grades, and especially if you’re doing that including the use of zeroes on a 100-point scale where only the top 50 points are “grades,” my aim is to convince you it’s time to adopt a new approach. I’m not claiming to have all the answers here, but I haven’t shied away from the work of making needed changes.

Have you ever been asked to rate something on a scale of 1 to 10, where 10 is the best and 5 is the worst? Of course not. You’d laugh at the idea. So stop using a grading scale that gives equal proportionality to something that would be like “negative worst” on that scale of 1 to 10. It’s illogical from any informed pedagogical or numerical standpoint. Students who are sufficiently motivated to avoid low grades will be motivated in more logical systems, while students who don’t respond to the negative motivator of harsh grading penalties will at least have a mathematical chance of recovery if they become motivated later. If you assign a student a single grade of zero on a 100-point scale, it takes 14 (equally weighted) grades of 85 to move that student’s average up to 80. Should a student with one zero need 14 Bs to convince us their level of understanding translates to “B”? And if the idea is to penalize lack of preparation, or cheating, why should it take that long to make up for the mistake? Play with the numbers yourself. There is no pedagogical justification for grading with zeroes on a 100-point scale that typically only “counts” grades above 59. If you feel strongly that zero is the appropriate mark for “no work” done, then your scale shouldn’t go up to 100, and the gap between zero and “D” shouldn’t be comparable to the gap between “F” and “A.”

From a measurement standpoint, there’s no particular need for a 100-point scale. We can’t meaningfully judge 100 levels of skill or achievement. If we use letter grades A-F (without “E” of course), including plusses and minuses for A-D, that’s 13 levels. I use a 4-point grading scale, using half-point intervals, yielding 9 possible grades (see: Marzano). I’ve never heard a compelling argument for assessing student learning to a finer scale than either of these options. No one can argue that they understand a meaningful difference among grades of 83, 84, 85, 86… Does any teacher need a B+- or an A-+? And the issue is compounded if you think there’s a meaningful difference in the broad ranges we reserve for “F.” We all seem to agree about the difference between a 75 and 90, so if we have a useful scale, the gap between grades of 23 and 38 should be just as meaningful.

We also need some critical examination of the practice of averaging in grades. Why do we penalize some students for not having mastered their learning earlier? We don’t withhold a black-belt from a martial arts student based on the prior inability to perform at a black-belt level. We don’t listen at a recital and judge the pianists’ skills less favorably because they couldn’t play these pieces a month ago. Averages can also mask glaring flaws; we should not give a driver’s license to someone whose array of skills average out quite well, despite an inability to drive in reverse or park a car. For the sake of brevity, I’ll direct readers to additional resources to learn about standards-based grading.

I know that on my personal website, blog posts about grading draw continual interest and frequent visits; this related blog post by NBCT Brianna Crowley elicited 47 comments at last count. I hope readers of this blog post will follow-up by doing additional reading and research, and extend the dialogue through other blogs or on Twitter. Check out the hashtags for standards-based grading chat – #SBGchat – and teachers throwing out grades – #TTOG. (More here re: teachers throwing out grades).

8 thoughts on “Assess Your Grading and Assessment Practices

    1. Hi Joy! Thanks for reading, commenting, and adding resources. We need to keep spreading the word, and good information!

  1. Hi David,
    I’m also a NBCT and as a theoretical mathematician and a statistician, I have been keenly interested in the issues of our grading system for some time. I wholly agree that there are tremendous inaccuracies in our traditional systems. But I would like to comment on your statement, “Have you ever been asked to rate something on a scale of 1 to 10, where 10 is the best and 5 is the worst? Of course not. You’d laugh at the idea. So stop using a grading scale that gives equal proportionality to something that would be like “negative worst” on that scale of 1 to 10. It’s illogical from any informed pedagogical or numerical standpoint.” I think there may be an inaccuracy in the analogy. Our current grading system was developed more along the lines of someone shooting an arrow at a target. 60-100% and the student hit the target; less than 60% and the student missed. The scores from 0-60% actually delineate how extreme the miss was. So from an “informed numerical standpoint”, there is a modicum of logic behind it. I think that until we get that point out in the open, and then discuss the pros and cons accordingly, we will not clarify the issue fully. Because until then, we will still have people proposing “fixes” that are not such. I had to battle a district-wide proposal for a 50% being the lowest grade a student could receive, including an assignment or exam never turned in. Fortunately, the measure met great resistance and was not approved. It did not address the real issues in the cases where a 0% was being used either as a punishment (the student cheated) or as a non-data point (the student didn’t turn in anything and therefore there was no reference from which to make an evaluation). If these zeros are understood as such, then the interventions can be more meaningful and targeted. As an example, I do give a student I catch cheating a zero. If it’s a homework or class work assignment, the grade is nothing more than a wake-up call, since HW is such a small portion of my students’ grades. If it’s a major project or exam and the impact would be significant, I keep the zero, but then I have a discussion with the student and parents. It usually includes conditions like, “no more cheating”, ” increased class participation”, “completed class work and/or homework” or whatever addresses the situation was that lead to the student feeling like he/she needed to cheat. If they follow up on their end of the bargain, I may replace the zero with the same grade they scored on the final exam or some other assessment that addresses similar standards, or even just drop the zero altogether. If the student simply didn’t make up an exam or turn in a project, and that’s the cause of the zero, then I allow leniency on when it can be submitted, often with some small but ultimately insignificant reduction of the grade; mainly so the students who did do their work on time do not feel slighted, and the students with the late work understands the practical necessity of deadlines. This puts the ball into their court, which is where it should be. Teachers need to do their part, and so do students. But teachers should not be expected to go to great lengths to undertake a student’s responsibility. It’s simply not practical, nor does it ultimately serve the student. With any approach that addresses the real issue of the low grade (there are countless and that’s where the creativity of the teacher and his/her knowledge of the students is critical), the students can ultimately demonstrate their mastery of the concepts, which I think is the ultimate goal here. But until there is a common understanding of what a grade means, and a reliable method of assessing (quantifying) those levels of mastery, there will always be unfair grading systems. Thanks for your efforts in all this. Keep up the good fight!!

    1. Robert, thank you for reading and taking the time to compose such a detailed response. You add some crucial points to the discussion, and I agree especially with your closing remarks that there will always be some unfairness, and that ultimately, our knowledge of our students and our commitment to their learning should inform our decisions and creative solutions.
      Regarding the 0-60 range indicating how far off the mark the student was, I see your point, but I don’t see teachers or students using that supposed information in any kind of informed way. I don’t think the deciles mean anything close to as much of the deciles at the upper end. When giving examples of this I often use 15 point ranges instead of ten to emphasize the point a bit more. I think we all know what we’d say about a student who improves from 77 to 92 – wow, that’s awesome, you’ve gone from average/middling performance and partial mastery to very strong performance, very near mastery. I don’t think anyone has a clue what to say about improvement from 22 to 37, except maybe, that’s moving in the right direction. And if we’re not going to distinguish meaningfully among 10s 20s 30s 40s and 50s, why have them? We also need to tease apart our assumptions about percentages anyways. We assign numbers and percentages to learning for the convenience of grading, but I’m not convinced we necessarily believe that a student scoring an 85 just needs 6% more knowledge, or needs to be able to answer 6% more questions, etc., to earn an A. As long as we deal with numbers that large, or percentages, I fear we’re distorting the issues of learning.

      1. I’m not saying that our delineation of the first 60% is accurate or insightful. I’m merely pointing out that it is based on a tradition (albeit misguided) of the ‘hit’/’miss’ notion, and that if we acknowledge that aspect, we may gain some further traction on understanding the real issues at hand. Ultimately any system where we attempt to quantify a concept that has no clear way to do so, we are dealing with a large degree of arbitrariness. Why for example is a 2-4 (on a 1 to 4 scale) often considered “passing”. Why not only a 4, or 3-4? It’s arbitrary. For example, if I’m evaluating a brain surgeon, I’m hoping it’s a 4+++++ (I totally intended the irony there), especially if he or she’s operating on ME. I do agree with how you view the lower 60%. It does not translate linearly, in the sense that an equal growth in the lower region necessarily means the same as in the upper 40% region. But again, that’s really arbitrary as well. Sometimes it can mean something similar. It’s really all about statistics, distributions, and what amounts to loose correlations in our assessments and a student’s level of comprehension. In looking at a system like Advanced Placement for example, were there are literally millions of data points, with extensive longitudinal connections, we have a lot more “INFORMATION” on which to base our decision on what we feel is an adequate demonstration of competency. And that’s really the best we can do. Take a hard close look at what our scores are telling us, and devise the most fair system we can to reflect what we believe is the level of concept mastery we can, we will make positive strides in reforming our system. Thanks again.

      2. As a follow up, I’d firstly like to clarify that I’m VERY much a proponent of grade reform and of your efforts. The intent of my responses is to share a single person’s perspective on some of the obstacles I’ve encountered. We cannot ignore the power of tradition. (I LOVED Fiddler on the Roof!!) And in doing so, we have to first realize why we are doing things the way we do to begin with. Once we better understand that notion, then maybe we can have a clearer vision on the paths to take to change outdated methodologies. So that’s my original point in bringing up the “arrows at a target” analogy.
        But as an addendum, I’d also like to propose that another major point of disagreement is that we tend to present our points, which are by our nature singular perspectives, as having more universal applicability then they may have. As an example, you mentioned, “I don’t think anyone has a clue what to say about improvement from 22 to 37…”. Actually, I know of someone who very much does. That’d be me. In one of my classes (AP Calculus) a 22% is a high F and a 37% is two points from a C. I’ve had students literally (and I literally mean literally) dancing when they’ve made that kind of improvement. And I dance (not literally) with them! So my point here is that there is not anything inherently wrong with using a 0-100 scale, verses a 1-9 scale, a 1-4 or even what I use, a 0-1000 scale, since I round to the nearest tenth’s place. I’m under NO illusion that I can somehow magically measure a students learning to that degree of accuracy. Kids just seem to like it when I use this method. It’s what I do with that information afterwards that I feel is pertinent. My grading scale (0-27 F, 28-38 D, 39-53 C, 54-66 B, 67-82 A, above 82 is 100%) is based on 15 years of longitudinal data of my student’s exam scores and how they correlate to their AP exam scores. It’s not perfect for sure, but there is some degree of reliability in those measures. And then before I assign their final course grade, I pour over the entire years worth of formalized data, informal observations, and whatever else I can rely upon to determine if the tenth’s place number sitting next to their name is an accurate representation of their learning. And then that’s translated into a 5 level assessment – that being an A – F. I’m mentioning this merely as a single perspective. Not that I’m claiming I have the answers for everyone or anyone, and certainly not that my methods are universal. They’re not to be sure. But they work for ME and they’re based (loosely) on our traditional system, mainly because that’s what people are used to and that’s what my electronic gradebook and district require. I first started teaching in Sierra Leone, West Africa, then taught on the Navajo Reservation in New Mexico, then went to Saint Lawrence Island in the Bering Sea, afterwards to a coastal village outside of Nome, and finally to a suburban school back in New Mexico. And of the many things I’ve learned, probably the starkest is how extraordinarily diverse educational pedagogy can be and learning still take place. Thanks!

        1. Thanks for adding and sharing, Robert. You’ve had quite an educational journey. Your example regarding the 22 improving to a 37 kind of reinforces my point too, since your scale is not the common version I was referring to where everything below 60 is an F (G, H, I, J, K?). Your grade (A-F) distribution is relatively proportional all the way along the 100 point scale, as opposed to have a 10-point range for each grade A-D and a 59-point range for F. Of course, a lot of this depends on the measurement tool as well. Scoring anywhere from 20-40% on a multiple choice (4-5 item) quiz/test probably doesn’t tell us anything (unless we maybe break down results according to question type/topic, and even then…). However, if a student is able to solve 25% of the calculus problems presented on a test, that’s useful info (and probably what you’re dealing with). Your comment about calculating and rounding down to tenths brings up a topic I’ve discussed with other teachers as well. I’m at the opposite extreme. Using a 4 point scale with each graded item possibly earning a mark at 0.5 intervals, I have 9 options, technically. As a practical matter, I really stick with 6-7 of those marks, as no submitted work generates a zero, almost never a 0.5, and only slightly more often a 1.0. If I later calculate grades at the 0.1 interval, each tenth represent more than 2% of the possible range. For most grading categories in my course, I’m using a standards-based approach. At the end of the course, I want to grade attained skill and knowledge, not average performance. There’s no penalty for having modest writing skills early if those skills are much stronger in the end. So, I only look at the average in a category (say, Writing) as one piece of information. If a student has steadily improved and has skills clearly matching and partially exceeding the standards, the corresponding mark is 3.5, and I’ll use that in calculations of the course grade, even if the average Writing score was 2.5. If a student’s record is more volatile, I’ll use a final category grade that’s closer to the average since there’s “noise” in the measurement, and averaging is an appropriate tool for addressing that. Still, if the average writing assessment grade comes out to 2.7, I’ll generally use 3.0 in further grade calculations, reasoning that a student cannot accidentally write well and that the higher measurements are more likely close “true.” (Robert Marzano writes about this concept of what’s “true” in assessment in his book on grading – highly recommended!). So, in any given category, it’s common that I’m “rounding up” considerably. In contrast, I know teachers who are like you in using the tenths, but less aware that the tenths have no accuracy, and their students, to my knowledge, don’t seem to like it – judging by the stressful conversations and emails that ensue. I’ve encouraged them to dispense with tenths and round to the nearest whole number. If you never measured tenths, then, if I understand my math/science correctly, tenths have no business showing up in your final answer. I wouldn’t write a travel article and say, “Prepare for hot weather in New Mexico in August, with daily high temperatures averaging 85.2 degrees.” Yet, I know teachers who hold on to tenths as meaningful digits and won’t (officially) round to the nearest whole number. You wrote: “I pour over the entire years worth of formalized data, informal observations, and whatever else I can rely upon to determine if the tenth’s place number sitting next to their name is an accurate representation of their learning.” Fascinating. What if your informal observations and other info you “can rely upon” suggests that digits on the other side of the decimal are not “an accurate representation of their learning”?

        2. You’ve clearly thought through your methodology well and have a great grasp of what is does and does not mean! That’s what it’s all about and kudos to you. To answer your question of what I do if essentially I believe that the grade calculation sitting next to a student’s name seems too high. Well – I’ll never drop it; only raise it. I’m basically going on the notion of margin of error, and I’ll always act on the benefit of the doubt going to the student. But in that case, I’ll look at my system and see why this student earned a grade higher than what I think is reasonable and make adjustments for the future accordingly.

Leave a Reply

Your email address will not be published. Required fields are marked *