Monday, March 03, 2008

Stop making stuff up!

We've com- plained a lot in the past few weeks about inter- changeable heds: the ones you could (and often do) put on any story without regard for whether GLOVES COME OFF! or MAKE OR BREAK! reflects any actual development in the story atop which it appears.

Today's example is a little different. "Polls: Ohio, Texas races tighten" looks as if it was just pulled out of the bag of day-before-the-primary heds, but it has a singular disadvantage: It isn't true. Or, put more roundaboutly, of several things you could reasonably conclude from current survey data, "these two races are tightening" isn't any of them. So before we get into the gory statistical details, a couple of suggestions, in case anyone wants to listen:

1) When you write a hed, make sure you can (a) place a finger on the fact statement in the text that corresponds to the hed and (b) confirm the fact in line with prevailing standards.

2) Don't write a flawed or overdramatized hed because a designer painted you into a corner. If the story doesn't have the hed specifications for the hed it needs, demand spex that conform to the story. Don't make the story fit the spex.

3) If you're going to generalize from samples to populations (and whether you like it or not, that's what every poll story since Caesar's "Gallic Wars" has done), learn the rules, all right? You don't have to know many, but you have to know them, and you have to be ruthless in editing stories that violate them for whatever reason. Honest. If sports writers treated their data as badly as we treat ours, the peasants would be at the gates.

Back to our hed. For it to be true, two things have to be true: Polls have to have tightened in Ohio, and they have to have tightened in Texas. Did they? Let's take a look. For context, let's use what the paper reported Saturday. Remember, the present tense in heds ("tighten") signals the event in the immediate past that tells you why today is different from yesterday -- the "why am I reading this?" development:
Polls now give her a modest lead in Ohio and show Texas is a toss-up, but earlier she had large leads in both states. (AP)

and inside Monday (reefered from the package shown above):
The race is close; a Mason-Dixon poll for the Cleveland Plain Dealer released on Sunday put Clinton ahead by four points, 47-43 percent, with an error margin of four points and 9 percent undecided. (Philly Inquirer)

So in the Mason-Dixon poll (in the field Feb. 27-29, interviewing 625 likely voters), Clinton leads in Ohio, but the lead isn't significant at conventional levels of confidence.* What has happened since? Let's have a look at the compilation at RealClearPolitics (which, if you discount the thoroughly bogus "RCP average," is a good one-stop site for data). Four polls of likely voters began at the weekend: SurveyUSA (n=873; Clinton 54, Obama 44), Public Policy Polling (n=1,112; Clinton 51, Obama 42) and Suffolk (n=400; Clinton 52, Obama 40), all in the field Saturday and Sunday, and Rasmussen (n=858; Clinton 50, Obama 44), which polled Sunday only. In the first three, Clinton's lead is significant at 95% confidence, meaning there aren't any non-chance cases in which Obama could be leading. In Rasmussen, Clinton's lead isn't significant, but it's larger than the difference in the Mason-Dixon poll.

So what does it all mean? It could mean everything's pretty stable. Random samples from a normal distribution will eventually form a normal distribution themselves, so the data could mean Clinton's support in the whole population (likely Democratic voters in Ohio) has been steady at, oh, 48 to Obama's steady 43. Or, given that most of the most recent surveys show a significant Clinton lead, Clinton's support could be growing. The one thing you can't get out of these data is that the race is tightening. It may be, for any of several reasons ("undecided" being a rather important one). But it isn't something that the polls show, meaning the hed is false.**

Are the later polls better? Well, you can say with a high level of confidence that they're later. Most of them have larger samples than Mason-Dixon (meaning tighter confidence intervals). They're also in the field at the weekend only and for shorter time periods, introducing other possible sources of error we can't quantify. If that tells you anything about survey data on voter intentions, it should be that the things that make polls interesting also make them very difficult to draw confident predictions from. That doesn't mean polls are unreliable; it means the things that limit their reliability need to be taken into account.

And Texas? Look for yourself. Those results are only tightening if -- oh, if you're a hed writer who has to say something novel about the primaries and knows there's no penalty for inventing a few conclusions from data you haven't seen. Ahem.

It'd be nice if we didn't have to return to this subject every few weeks, but Some People are Not Paying Attention. The polls are getting a lot of stick in this campaign season, and that's not fair. The polls are performing exactly the way they're supposed to. The fault is with people -- pollsters, candidates, "experts," reporters -- who decide to put lipstick on an otherwise fully functional data pig and parade it around to demonstrate their indispensable wisdom. You can't spell Marketplace of Ideas without "marketing." At least, not very well.

* Though it would be at 68% confidence! How many times do we have to repeat this? The margin of sampling error is meaningless without its confidence level.
** Clinton's complaints about unfair press coverage are silly to the point of being delusional. But if she wanted to complain that this hed gave a false opinion about her campaign, she'd be right.

Labels: ,

0 Comments:

Post a Comment

<< Home