More than two months passed. The inaccurate 21-to-1 figure was bandied about by the NPR, the New York Times, and The Economist.
Then, on a quiet Christmas Eve, ProPublica's Ryan Gabrielson and Ryann Grochowski Jones posted an article to address criticism (mainly brought by me and David Klinger) of their initial study.
I don't want to waste much more time on this; I've wasted too much already (see 1, 2, 3, and 4). But I do find it funny, in their piece, after many paragraphs focusing on the red herring non-issue of hispanic undercount, there it is -- buried in the 11th paragraph -- they kind of admit I'm right: the ratio might be 9 to 1!
Maybe I should just stop there and say, "you're welcome."
But, but, I can't! Because then there it is -- a revisionist gem -- they say the actual number doesn't really matter: "And whether 9 times as great, 17 times or 21 times, the racial disparity remains vast, and demands deeper investigation."
What the fuck?!
The 21-times ratio is the only real point of your original article (which is still up and unapologetic)! And the only real point of my bitching was that 21-times is wrong. Now even 4:1 or 9:1 may be too large. And it does demand deeper investigation. So why not investigate deeper (Or at least crib from those who have)?
According to ProPublica: "the data is far too limited to point to a cause for the disparity." Actually, no. The disparity can be explained pretty well, without too much "deep" digging. What I'm about to tell isn't the "deepest" investigation, mind you, but it's a start. And it's on me, guys. Gratis.
The black to white racial disparity (all ages) of those killed by cops since 2000 (and reported to the UCR, which is big caveat) is 4 to 1. The racial disparity among those who kill cops is 5 to 1 (the rate is per capita, mind you, not the absolute number). I'd bet $20 it holds for teens, too.
Now one could say, as does Prof. Klinger, that the data on police-involved homicides are simply too limited to make any point at all. But if one is willing to play with bad data (and I'm game, if they're the best we got), then you can't say your conclusion is fine but... other conclusions? ...well, "the data is far too limited."
Finally -- and it goes back to my point about outliers and cherry-picked bullshit data -- ProPublica has the chutzpah to say they can't go back further in time -- thus including more data, increasing statistical validity, and decreasing the magnitude of their conclusion -- because, get this: they can't get accurate population numbers.
So let me get this right: they're fine using fucked-up UCR data on justified police-involved homicide, they're fine cherry picking an outlier three-year sample with an "n" (total cases) of 62, but they wouldn't dare look at more years because we can't estimate the US population between 2001 and 2007? Are they on crack? Are they stupid? Or are they simply blinded by ideologically bias. I honestly do not know. But it's a nonsensical line of statistical integrity for them to draw.
Here is it in their words:
Using Census 2000 and Census 2010 data for baselines assumes that the ratio of populations remain static, and that a snapshot of population rates for a subset of time can be assumed to be accurate for an entire period. We know that's not true.... To test the critics' argument, we calculated risk ratios for as far back as the American Community Survey data goes (2008) [ed note: the ACS actually goes back to 2005, but whatever]. From 2006 to 2008, the risk ratio was 9.1 to 1 (with a 95 percent confidence interval 6.19, 13.39).First of all, stop the fancy talk about "risk ratio" and "confidence interval." You either don't know what you're talking about or you're knowingly trying to mislead.
Speaking above your reader's head is a dirty rhetorical trick to hoodwink gentle reades into trusting your statistical acumen (which is pretty crappy). As my grand pappy used to say, "Ain't no need to use a 25-cent word when a 5-cent one will do." (See, now I'm usin' the reverse rhetorical trick by affectin' an aww-sucks-I'm-just-a-common-guy style of speech here.) For what it's worth, my papou was an immigrant who spoke with a Greek accent.
"Risk ratio" here means nothing more than "more likely." "Confidence interval," well, if you're going to use it, explain it. Better yet, explain it accurately* or at least point out that it supports 9:1 more than 21:1.
More to the point, it's pointless to discuss statistical nuances of irrelevancy! Of all the problems in your analysis, you're going to draw the line at estimating population in Census off-years? Really?! It's like we're sitting in your rusted jalopy and you tell me you can't drive me home because the windshield wipers aren't working. But you failed to mention the fact that the engine is broken!
Of course we can estimate population figures, you fools! The US population grew 9.7% between 2000 and 2010. Talk about easy math! Go on, be bold, you dirty devil: assume a linear population growth for all categories. Divide 10% by 10. It comes out to 1% a year. I know it's not perfect, but it'll be close enough; trust me. (Actually population growth of 9.7% over 10-years comes out 0.925% compounded continuously.)
Will this population estimate be perfect? No. Is it good enough? Yes. Will it tell you far more about what you claim to show? Of course. Is that why you won't do it? Probably. Would this population estimate be the single most accurate number in your entire analysis? Abso-fucking-lutely.