# Box and Whisker Plot

The story I wish to tell today starts in late 1950s or early 1960s (and really even quite a bit earlier), even though our story is really about assessment. It was the early computing age – very early. And, it was just beginning to be feasible to have access to a great deal of data. Thus, instead of performing a few calculations by hand, as had been usual up until then, a bold new science was being developed called Exploratory Data Analysis (EDA). John Tukey was the champion of EDA, and he developed the very cutely titled Box and Whisker Plot. Of course, in today’s less refined era, when aeroplane passengers are crammed together like sardines instead of serenaded with piano arias, we simply call them “boxplots.” We are of course the worse for it (see Figure 2 at the end to see just how worse).

Suppose we had these data of student scores from some exam:15 39 41 42 45 46 47 47 49 49 51 51 52 53 53 53 53 54 54 54 54 54 55 55 55 56 56 56 56 56 57 57 57 58 58 59 59 59 60 61 61 62 62 62 62 62 63 63 63 63 63 63 64 65 65 67 67 67 67 68 68 68 69 69 69 69 70 70 71 71 71 72 73 73 73 73 74 74 75 75 76 77 77 77 78 78 78 78 79 80 81 81 82 82 86 87 91 91 92 92

By themselves, they don’t perhaps mean much. Whenever facing new data of any sort, it is very useful to look at the boxplot. I have one for us here in Figure 1. This plot is actually rather interesting. You see, what it does is pictorially show what we call the ”five figure summary” of our above data.

Figure 1

At the very right of the whisker, at 92, we see our data’s Maximum. At the left whisker, at 39, we see the end of our ”usual” data. Take a look again at that long list of numbers above. See how our smallest value, the 15, is way far away from the rest of the numbers. Yes, yes, it is our minimum, but it is also different. It is an outlier, and we’d be very silly to make decisions based on just him! That is the power of the boxplot – it shows us just how far away 15 is from all the other points. It is that circle in the boxplot.

Next, notice that inside the whiskers, there are really four distinct areas. It goes from 39 to 55.75 (it’s tough to see but I promise it does), all in the left or lower whisker. That is our bottom 25th percentile. Remember these are student scores of some sort! If we racked and stacked our students, that is where those who found that exam the most challenging live. If we want to understand those who experience true trials in a course, we could look at those students.

Next, between 55.75 and 63 is the 25th through 50th percentiles (that bold dark line in the middle is the median data value. These students encountered difficulties, but they outperformed a quarter of all students not too shabby. Still, the course was not easy for them, so if we wanted to find a group who could use just a little more support, we might look there.

Between the median’s dark bold line and the upper hinge, ergo between 63 and 73, we find the 50th to 75th percentile. Sometimes we call these the 3rd quartile. So, one in four of our students, scored between 63% and 73% on this exam. Furthermore, those students beat half the class with these scores! They’ve done better than most, and while they do not understand everything about this course (as evidenced by the 63%-73% score on the exam), they are doing well enough in the course compared to their peers (as evidence by living in the 3rd quartile).

Now is as good a time as any to note that if I were teaching this course, I would start to ask myself, “What do I need to say to these students in the 3rd quartile?” Clearly, they put effort into the course. If they didn’t, they wouldn’t be outperforming their peers. On the other hand, their scores (at least some of them), are just a shade lower than I’d prefer. Do they need a motivating talk? Do I need to be clearer on some of my explanations or expectations? Or, do they need some combination of the two?

From the upper hinge to the top whisker, students who earned scores of 73% to 92% on the exam, we find the top 25%. These are the high performers.

Now, if I had done what I used to do with data, I might have told you “My class average is 64.35%.”   I probably would have rounded to 64%. And believe me,that is a perfectly fine point estimator. However, that wouldn’t have told you the whole story would it?

Furthermore, if we’re discussing stories, I could have given you the following chart:

summary(ten)
Min. 1st Qu. Median   Mean 3rd Qu.   Max.
15.00   55.75   63.00   64.35   73.00   92.00

Except, that chart doesn’t really show what our boxplot does (nor as quickly). I’ve nattered on enough for this week about the boxplot. Next week, we’ll discover how just how powerful the boxplot can be to compare old exams versus new or last term versus this term and the terms before. Never be shy about getting in touch with me to ask questions, and be sure to look at our CAPE calendar for some real world workshops from me about assessment and easy ways to look at data.

Until next week, stay safe!

Figure 2

1. Jenny Jennings says: