I'm a stickler for the honest presentation of data. Too many people, it seems me, just don't care. I mean, it is easier to just make numbers up and share a picture on facebook if it supports your ideological position.
When it comes to data analysis, I didn't expect to find an ally in late-night TV. So check this out.
If you don't have 7 minutes, watch from about 2:22 when Meyers (A Northwestern Alum) talks about a misleading slide presenting in a congressional hearing.
At 3:40 Meyers says, "Let's take a closer look at this graph." Let's. Because nothing says pure comic gold like data analysis. And Meyers nails it:
A) "There's a bigger number at bottom and a smaller number at the top."
B) "You can't have 2 million here and 300,000 there [in line with each other, horizontally]."
C) "And they made a chart with no Y-axis!"
Well played, Seth. If we ever made a bet about the words, "and they made a chart with no Y-axis" never being said on late-night TV, I guess I lose.
Update: Let's play with graphs a bit. Why not? It's fun.
Given the numbers above (which may be false), the chart should look like this:
What is "prevention services"? I don't know. Why pick one category that perhaps (probably) decreased a lot? Well, to mislead. And based on two minutes of online research, it seems more reasonable to look at the total number of patients and the number of abortions (the abortion numbers seem to be correct, by the way). Then the chart looks like this:
Of course this looks less dramatic. And that's exactly the point.
Now keep in mind the charts above don't have 2 y-axes. There's just one: the number. To use two different scales for the same measurement is weird and suspicious. But there are times when you do want to use 2 y-axes. But you can also do so to mislead. Take this:
The data are correct. But it's still intentionally misleading. Why? Because a reasonable interpretation would be that greater incarceration numbers correlate with fewer murders. Indeed, during this time period, they did. But why did I select this decade? Because it's the only decade where this is true. I cherry-picked the data. Not cool.
I mean, I could have picked any of these years:
Now homicide and incarceration are positively correlated now! The more people we lock up, the more people kill each other. The facts have changed. And all the data are correct. This is where it's important to repeat that popular phrase: correlation does not equal causation.
But along with cherry picking data, I've done another misleading thing. I've changed the scale of the left y-axis: From 2000-2007 it goes from just 5.4 to 6.2! That's just me trying to intentionally mislead (for educational purposes only).
Of course there are choices and selections you have to make in any chart. Here's the same data but going back to 1983:
Both axes go down to zero. That's not necessary, but other things being equal, it's good.
I mean look at this crime drop in NYC:
Compare it to this one:
Of course it's the same data. It's just that on the first one the y-axis doesn't go to zero. It makes the drop look bigger. Is that misleading? Potentially. Depending on what your point is. If your point is to highlight the actual numbers, then it's fine. If your point is that homicide plummeted during those years (which it did), it would be somewhere between odd and misleading to start the y-axis at the lowest data point, because that seems to imply that murder dropped to zero.
Here are homicide and incarceration going back to 1925:
Now this is legit. The y-axis goes to zero. Nothing funny there. But why is it homicide rate and incarceration number? It turns out it's just easier to get homicide rates and incarceration numbers. And it so happens, I happen to know, that in this case it doesn't matter. The chart looks basically the same. But that switcheroo should still be a red flag to the discerning statistical consumer.
In the end, I use this:
Both y-axes are rates. No funny stuff there. I've also bolded the numbers and thickened the lines for better clarity. (It might also be nice to make the chart readable for black-and-white reproduction, by making one line dotted or something. But I don't like the way that looks. And I know I'll be showing this in color.)
Also note the left y-axis does not go to zero. That's a choice I made. It's not to mislead but to create a better visual presentation. The point I'm trying to make, based on the data, is that there isn't any inherent correlation between crime and incarceration. Homicides go up and down for whatever reason; incarceration is a political choice related to the war on drugs.
But the discerning reader might observe, "how the hell do you know numbers from 2015 when they year isn't over?!" Good point. I don't. I basically made an educated guess for the sake of visual clarity. Can I do that? Sure. I'll update the info next year when I do know. It matters that the specific 2015 data point isn't really important here. This is a choice based on my needs for this chart. I want the x-axis labeled at nice intervals. And if the data ends in 2014, it looks funny (like in the chart immediately above).
And last but not least: