Box & Whisker Plot Comparisons

Last, week, we met the most adorable of Exploratory Data Analyses graphs, the Box and Whisker Plot. This week, we begin to move beyond EDA (though not too much, and certainly not all-at-once).

Looking at what we were doing last week in our Box and Whisker Plot, we see that we had student performance on a single item. While this is interesting enough, it presents a very static picture.

One of the things we like to look for in assessment is whether something has improved after, well, improvements are made. To do that, we need to track our data long-term. In fact, we may not even know when we start collecting the data precisely how we might use it. Thus, it helps to be a digital hoarder.

Below in Figure 1, I’ve created a fake set of student data. I did this in an Excel file, and I can just append new data from new terms and new students as often as I wish. Suppose I do this for Spring 2014, Fall 2014, and Spring 2015.

24Aug_Fig1Remember our box plot from before? This time we simply have one box plot for each term (see Figure 2).


This format makes it easy to see that the average Final Exam Score is improving. Even better, we can see that not only is the median score improving, but the total length of the box plots are getting a bit shorter. This means we’re getting a more consistent result in student performance.

Right now, these sorts of graphs are just a bit tricky to get. However, I wanted us to start looking at them now, for two reasons. The first is that the 2016 version of Microsoft Excel is going to have native support for boxplots. I foresee these becoming a great deal more common in the near future. The other reason I wanted to show this sort of graph is that I want to start using them to easily communicate much fancier comparisons than just single number summaries.

For example, suppose instead of the above graph, I had the one in Figure 3. Each one of these has about the same median score – not much has changed there. If I were reporting data with just a single number, it would read 72%, 73%, 72.5%. But notice how much more consistent each class is getting! Instead of a very tall spread of data, with each term, it gets so that the high performing students and the low performing students are much closer together!

Three box plots in a row (less variance)

Please note that I’m not debating if we actually want class data that looks this way! I do claim though that looking at the summary in the box plots can give us a much fuller picture of what our data truly look like. Much better than a single point estimator such as a mean/average score.

I’m going to let this conclude our arc of discussion about box and whisker plots. Next week, we’ll look at a couple other things.



Matt Wiley is a tenured, associate professor of mathematics with awards in both mathematics education and honour student engagement. He earned an assortment of degrees in computer science, business, and pure mathematics from the University of California and Texas A&M systems. He is the director of quality enhancement at Victoria College, assisting in the development and implementation of a comprehensive assessment program to enhance institutional performance outcomes. A programmer, a published author, a mathematician, and a transformational leader, Matt has always melded his passion for writing with his joy of logical problem solving and data science. From the boardroom to the classroom, he enjoys finding dynamic ways to partner with interdisciplinary and diverse teams to make complex ideas and projects understandable and solvable.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s