headsup: the blog: More stupid stuff not to do with polls

Here's another good example of garden-path journalism -- writers being led up the garden path by something they'd desperately like to be true, even if all the available evidence suggests that there is no there there:

This could be huge for Santorum. I’m guessing people in Iowa like what he says, but needed permission to support him in the form of some assurance that their votes wouldn’t be wasted. If he’s trending upwards in the polls, they get that permission.

OK, it's the National Review shilling for a hard-right candidate, but this isn't a fault of partisanship. NR is doing what journalism does: hammering the data into a story line it wants to see, rather than asking the data what the story should look like. That's not a partisan issue, but it is an ideological one. We won't fix it by demanding that National Review* stop inflating the appeal of repellent sleazeballs; we can begin to address it if we ask our news organizations to stick with the numbers and treat the campaign "story line" as the cultural fiction that it is.

To assess all that, let's have a look at how and under what conditions some (otherwise rational) news organization might want to claim that "Santorum jumps to third." On to some recent polling results, for which we'll draw on the data kept at RealClearPolitics.

Once again, never -- that's "never," as in never -- allow the "RCP average" to be quoted in a news article or grownup opinion piece. It is a meaningless number, as is every "average" of polls involving different populations and different sample sizes. But the crosstab does provide some useful information that helps illustrate how random sampling works and what it does and doesn't illuminate. For that, RCP is genuinely useful:

To put this into context, it helps to remember what samples do, and the most important thing that random samples do is form a normal distribution around the population mean. If 42.5% of registered voters believe that pigs can fly and you regularly ask 400 randomly chosen registered voters whether pigs can fly, you'll usually get a "yes" result that's fairly close to 42.5%. Indeed, about two-thirds of the time, you'll get a result between 40% and 45%. If you roughly double that confidence level, the "margin of error" will be 4.9 percentage points -- meaning that 95 percent of the time, somewhere between 37.6% and 47,4% of your 400 registered voters will agree that pigs can fly.

Based on the samples shown in the second illustration, then, we could plausibly say that Santorum has "jumped to third." Or that he's "plummeted to fifth." Or that he's "hanging right around a tie for fourth." (Or, of course, that Romney has leapt into the lead, which is statistically as valid as Santorum's jump into third and should be at least as interesting, if you take this sort of crap seriously.) All those are ways of describing the results we see, and the more exciting they sound, the more likely they are to be wrong.

Take it to the bank, kids. If five roughly simultaneous small-N polls of the same population show that a candidate's support is at 10, 16, 10, 3, and 1o, the smart money is on 10, with a side bet on stupid, confused, and nonrepresentative.** The National Review is grasping at straws here, but it's grasping at the same sorts of straws that our supposedly professional journalists grasp at too.

Need some free advice? Here's how to calculate the "margin of error": First, start with a random sample. Divide 0.25*** by the number of cases in the sample (with a sample of 400, you should get something like 0.000625). Take the square root of that, then multiply by 1.96 to get the margin of sampling error at 95% confidence.

It's no harder than calculating an earned-run average, except that back in the good old days of baseball we didn't have calculators. Well, we do now. Editors who deal with political surveys should own the "margin of error" as thoroughly as the sports desk owns ERAs and slugging percentages, and the calculations are about three clicks away on your "start" button.

One logical consequence of that basic approach to statistical competence is that we should begin every discussion about who's leading or diving or treading water in any campaign with a few simple question: Sez who? On what evidence? Who cares? Whichever your perspective, that's how you should go after the next story about who's planted his or her flag in the smoking ruins of what enemy capital, and on what statistical grounds. You might find the results surprising.

* I'd like to think Bill Buckley would be repelled by NR's enthusiastic embrace of overt, knuckle-dragging racism, but that's a distinctly non-empirical opinion.
** Sorry if that sounds like the modern American electorate or the modern Amercan newsroom.
*** There's a reason for this; a 50-50 split produces the largest possible confidence interval, so this is the "maximum margin of sampling error."

Labels: clues, politics, polls

3 Comments:

tiffany jewellery said...: Fantastic goods from you, man. Ive study your stuff ahead of and youre just as well amazing. I enjoy what youve got right here, adore what youre stating and the way you say it. You make it entertaining and you even now manage to help keep it wise. I cant wait to go through additional from you. That is really an incredible weblog.; 2:16 AM, December 30, 2011
Walt Taylor said...: I also cant wait to go through additional from you, the Daily Show of the newspaper biz.; 7:33 AM, December 30, 2011
Anonymous said...: I think perhaps a more useful rule (or at least one that would be easier to explain to Joe and Jane Reader) would be to rank candidates in order of the probability that they are actually the most popular among the population being sampled. Bonus points for publishing the actual election results the same way. (It should be possible to do this by comparing your exit-poll sample with an all-eligible-voters sample.); 7:57 PM, December 30, 2011