Last week we talked about machines’ learning. What I want to do this week is give you a thought-example that will lead my gentle readers to a better understanding of what types of analysis we should trust machines to do. It isn’t even always true that machines can do the analysis ‘faster’ than humans might, although they can certainly go longer (and never tire).
Take a look at Figure 1, and ask yourself, how many groups or clusters of dots do you see? One group? Two groups? More? Draw a circle around your groups.
This example looks at data from iris flower measurements in centimetres (from a rather famous data set). It contrasts the length of the petals with the widths, and my elementary statistics students would be comfortable telling you these data are positively correlated. As iris flower petal length increases, so does the width generally.
If I asked you for the mean or ‘average’ for the whole, you’d probably tell me something in the middle. Perhaps Petal Length about 4 and Petal Width about 1.3 or so.
However, it seems fairly clear to me (and I’m guessing most of you), there are at least two distinct groups of iris flowers. At the lower left, we see some that are quite small, and then up and right gives us another group. It might make sense to talk about these two groups distinctly – they are different.
If we’d circled or sorted these into two groups, well, then we’d have two means. There’d be the smaller group’s mean with lengths and widths of 1.5 and 0.3, while the larger group would have a mean of perhaps 5 and 1.6 centimetres.
To see this visually, go ahead and visit RStudio’s Shiny Server. Set the X Variable to Petal.Length in the drop down, the Y Variable to Petal.Width, and the Cluster Count down to 2. Notice where the two black crosses mark the spots! Notice the recolouring of the dots to show the two groups. Of course, notice the one odd-man out dot that seems to be part of the smaller group according to the analysis even though it looks like it ought to belong to the larger group.
Don’t let the code showing below the photo worry at you (unless you find it awesomely nerdy). The point is, machines are getting better, though not perfect, at grouping or clustering data.
Clustering data allows for logical pairing or grouping. And machines can do such things fairly quickly through quite-good-yet-not-perfect algorithms. Imagine if we could sort students into those who might do well online and those who should have more traditional courses? Imagine if our thousands of students could be sorted in a first pass by machine into groups that are than given customised, meaningful, personal support?
Big data doesn’t have to mean impersonal. Human compassion can be informed by machine logic. And machines can help humans make sense of thousands of students and tens of thousands of data points. Take a look at some pretty cool and entirely free dashboards samplers. Do you have regular data you’d like to visualize? Would some of those dashboards, powered by your data rather than irises, help you better understand your students? Send me an email or give me a ring at x2468!