About . . . . . . Classes . . . . . . Books . . . . . . Vita . . . . . . . Links. . . . . . Blog

by Peter Moskos

September 8, 2017

Still trying to explain...

What's wrong with the Brennan Center's analysis? There are many problems. But here are a few:

1) They take a non-random sample (which isn't bad in and of itself) and then A) don't tell the reader in the text and B) state conclusions as if the sample were a random sample (every data point equal chance of being picked), representative of the nation.

2) They take short time frames (1 year) to point out that fluctuations could be random. True. For a short time frame. They could take a longer time frames (3 years) and see more clearly developed patterns.

3) This is bit trickier to explain. And that's why I'm giving it another shot. They base their findings on a magnitude of changes within their sample. This has the perverse effect of attention getting conclusions -- "more than half" -- that are noteworthy only in direct proportion to the limitations of their sample.

Let's take an analogy. I want to look at murder in my City of Moskopolis (a fine city, despite a bit of a crime problem). So I take a sample of three police districts (out of ten equally sized police districts). Now it just so happens that we already know that murder in Moskopolis is up 20 percent. But our study looks at District #1, where murder is up 30 percent, and District #2, where murder is up 10 percent.

Now maybe District #1 is important for its own reasons. "Murder is up 30 percent in District #1." No problem there. Or maybe, as mayor of Moskopolis, I prefer to give a bit of spin: "Murder is up 30 percent in District #1, but not so much in rest of city." That's fine, too.

But I can't say this: "District #1 accounts for 75 percent of the murder increase in Moskopolis." This is not true. It is false. District #1 accounts for 15 percent of the city's murder increase.



But some guy who has a stick up his ass about accurate data (even though he really does have better things to be doing with his time) gets all huffy and points out this inconvenient truth to the Washington Post, which quoted my incorrect statement because I'm generally a trustworthy guy.

So the Washington Post calls me and says "What's up?"

"Oh," I say. "I'm sorry. I was talking about 75 percent in my sample. Did I not make that clear?"

The Washington Post dutifully makes the correction and updates the story: "District #1 accounted for 75 percent of the murder increase in two districts."

This is now no longer a false statement, but it's a still meaningless one. Who cares about what percentage of change there is in one district in my sample? Why are we talking about two districts when we could be talking about six, eight, or even all ten of them. And here's a doozy: What if murder went down in District 2? Could District #1 account for more than 100 percent of the increase in my sample? Mathematically, yes, says my calculator. But statistically an increase of 100 percent is absurd. Methodologically, this should be a big red flag.

Anyway, Moskopolis is still a fine place. And indeed, we shouldn't overreact to an increase a murder. But if the mayor says murder isn't up, perhaps you shouldn't believe the mayor.

No comments: