About . . . . . . Classes . . . . . . Books . . . . . . Vita . . . . . . . Links. . . . . . Blog

by Peter Moskos

October 2, 2015

"And they made a chart with no Y-axis!"

I'm a stickler for the honest presentation of data. Too many people, it seems me, just don't care. I mean, it is easier to just make numbers up and share a picture on facebook if it supports your ideological position.

When it comes to data analysis, I didn't expect to find an ally in late-night TV. So check this out.

If you don't have 7 minutes, watch from about 2:22 when Meyers (A Northwestern Alum) talks about a misleading slide presenting in a congressional hearing.

At 3:40 Meyers says, "Let's take a closer look at this graph." Let's. Because nothing says pure comic gold like data analysis. And Meyers nails it:

A) "There's a bigger number at bottom and a smaller number at the top."
B) "You can't have 2 million here and 300,000 there [in line with each other, horizontally]."
C) "And they made a chart with no Y-axis!"

Well played, Seth. If we ever made a bet about the words, "and they made a chart with no Y-axis" never being said on late-night TV, I guess I lose.

Update: Let's play with graphs a bit. Why not? It's fun.

Given the numbers above (which may be false), the chart should look like this:

What is "prevention services"? I don't know. Why pick one category that perhaps (probably) decreased a lot? Well, to mislead. And based on two minutes of online research, it seems more reasonable to look at the total number of patients and the number of abortions (the abortion numbers seem to be correct, by the way). Then the chart looks like this:

Of course this looks less dramatic. And that's exactly the point.

Now keep in mind the charts above don't have 2 y-axes. There's just one: the number. To use two different scales for the same measurement is weird and suspicious. But there are times when you do want to use 2 y-axes. But you can also do so to mislead. Take this:

The data are correct. But it's still intentionally misleading. Why? Because a reasonable interpretation would be that greater incarceration numbers correlate with fewer murders. Indeed, during this time period, they did. But why did I select this decade? Because it's the only decade where this is true. I cherry-picked the data. Not cool.

I mean, I could have picked any of these years:

Now homicide and incarceration are positively correlated now! The more people we lock up, the more people kill each other. The facts have changed. And all the data are correct. This is where it's important to repeat that popular phrase: correlation does not equal causation.

But along with cherry picking data, I've done another misleading thing. I've changed the scale of the left y-axis: From 2000-2007 it goes from just 5.4 to 6.2! That's just me trying to intentionally mislead (for educational purposes only).

Of course there are choices and selections you have to make in any chart. Here's the same data but going back to 1983:

Both axes go down to zero. That's not necessary, but other things being equal, it's good.

I mean look at this crime drop in NYC:

Compare it to this one:

Of course it's the same data. It's just that on the first one the y-axis doesn't go to zero. It makes the drop look bigger. Is that misleading? Potentially. Depending on what your point is. If your point is to highlight the actual numbers, then it's fine. If your point is that homicide plummeted during those years (which it did), it would be somewhere between odd and misleading to start the y-axis at the lowest data point, because that seems to imply that murder dropped to zero.

Here are homicide and incarceration going back to 1925:

Now this is legit. The y-axis goes to zero. Nothing funny there. But why is it homicide rate and incarceration number? It turns out it's just easier to get homicide rates and incarceration numbers. And it so happens, I happen to know, that in this case it doesn't matter. The chart looks basically the same. But that switcheroo should still be a red flag to the discerning statistical consumer.

In the end, I use this:

Both y-axes are rates. No funny stuff there. I've also bolded the numbers and thickened the lines for better clarity. (It might also be nice to make the chart readable for black-and-white reproduction, by making one line dotted or something. But I don't like the way that looks. And I know I'll be showing this in color.)

Also note the left y-axis does not go to zero. That's a choice I made. It's not to mislead but to create a better visual presentation. The point I'm trying to make, based on the data, is that there isn't any inherent correlation between crime and incarceration. Homicides go up and down for whatever reason; incarceration is a political choice related to the war on drugs.

But the discerning reader might observe, "how the hell do you know numbers from 2015 when they year isn't over?!" Good point. I don't. I basically made an educated guess for the sake of visual clarity. Can I do that? Sure. I'll update the info next year when I do know. It matters that the specific 2015 data point isn't really important here. This is a choice based on my needs for this chart. I want the x-axis labeled at nice intervals. And if the data ends in 2014, it looks funny (like in the chart immediately above).

And last but not least:


Kyle said...

What? People do this all the time. Search google for "graph with 2 y-axis". It's a valid way of showing the relationship between two things that might not have the same scale or the same units (such as showing crime per capita and police per capita from 2000-2015). Why do they need a y-axis? There's only 2 data points on each line, and they're clearly labeled. If there were 2 sides, there would be 2 points on each side, with the same numbers at the same height.

john mosby said...

True, the Economist frequently has two-Y-axis graphs, for example comparing inflation to unemployment. But they generally don't have crossing trend lines. The scales are arranged so that you can see one went up and one went down, or whatever, in the same period. But there's no such thing as unemployment being equal to inflation, so the graph designer makes sure they don't cross.

When you have your trend lines cross, that implies that the two y-variables are equal. Here, they are not: eyeballing the graph, it looks like screenings were at about 1.5M, while abortions were at about 300K. Why is this the crossing point?

There might have been more impact if the figures were converted into dollars - then you could have a single y-axis, and there might indeed be some year where the money spent on abortions exceeded the money spent on prevention.

Or, with a common dollar measurement, you could do one of those dramatic 'one-curve-atop-another' graphs (I don't know the proper name for it), where the "pink" area under the curve shrinks while the "red area" grows as time goes by. That might have gotten the point across. I would imagine that abortions, as invasive procedures, cost more than mammograms or even pap smears.

Of course all this assumes that the people on both sides of the issue have a minimal level of quantitative reasoning, or even design sense.

Prof, I think you should have a weekly "Fun With Graphs" post....


Peter Moskos said...

Because the point is to mislead.

This chart doesn't have 2 y-axes. It has one y-axis: the number.

Peter Moskos said...

Assuming the numbers are even true (not a given), the point from looking at the graph is that Planned Parenthood now performs more abortions than cancer screenings and prevention services. That's what it means when one is line is higher than the other. And that's not true.

Also, what JSM says.

Peter Moskos said...

Chart making is an art. But the purpose is the present data more clearly. And honestly. I always try and make the bottom of the Y axis 0, for instance. Otherwise lines can be misleading.

Kyle said...

And here what I thought is "abortions are going up while services are going down". I suppose I look skeptically at graphs, but the numbers are the biggest thing on the graph. (And I have no idea and make no claim as to the validity of the numbers.)

Peter Moskos said...

I added to the post. To compare apples and apples. It's the false intersection -- the point at 2010, when abortions seem to become their primary purpose -- that's the intentionally misleading part.