Radar Plots Can Be Misleading

Bottom Line Up Front: Spider or Radar Graphs aren't great for non-ordinal data. The way the data analyst chooses to arrange the data changes the message of the graph. There is no “objective truth” found in radar plots. Check out bar charts or parallel coordinate plots instead.

While enjoying a very refreshing beer (Pryes' Main Squeeze, from MPLS, MN), I noticed they used a radar or spider plot to describe the beer's characteristics. And this got me thinking about just how misleading spider or radar plots can be for data that is not ordinal, perhaps in ways that other visualizations are not. (When the features/characteristics are plotted over time, such as foot traffic by hour, spider plots aren't as likely to be problematic).

So we see on our beer that it scores fairly high on Lemon and Lime, mid-range on Grapefruit, and lower on Biscuit, Earthy and Sweet, meaning this is a more "Citrus" than "Malty" beer. But look closely at the shaded area, especially the "citrusy" area underneath the Lemon line (highlighted in orange here).

This area is heavily influenced the value of the adjacent characteristic, in this case Sweet. What happens if we swapped the order of Earthy and Sweet? Now the graph looks like this, and the beer sounds a lot more Malty and a lot less Citrusy.

The way the data analyst chooses to arrange the data changes the message of the graph. There is no “objective truth” found in radar plots.

There are many different ways we could rearrange this graph, and each one might give a different impression about the beer. With six characteristics, there are 6!=720 ways to arrange the graph. You could argue that since it's circular, there are only 6!/6=5!=120 ways to arrange the graph, but I think that up/down matters here, so I like the bigger number. Either way, there are well over a hundred variations.

It's not just the arrangement that changes, the size of the shaded area will change too. Let's look at a scenario with six characteristics graded on a scale from 0-5. Assume the individual in question scored [A=0, B=0, C=1, D=2, E=4, F=5]. Here are three different ways to represent that data (by switching up the order the variables are plotted...i.e., should V1 be A or B):

Remember that there are hundreds of ways to arrange this? Suppose that we assume “more area implies better performance.” Well, the total area depends on how you choose to arrange the graph, ranging in this case from about 2.5 to about 15...a huge swing. Here's a breakdown of the different area sizes that could pop up for different spider plot arrangements...remember that our data stayed the same, it's just the order that we plot it that changes.

So what should you do? I'd recommend either a bar graph or a parallel coordinates plot in most cases. They don't look as neat, maybe, but they're likely to be a more faithful representation of the data.

The thing about p-values…