Sunday, December 30, 2007

When bad things happen to good data

There's an in- nocent way and a less inno- cent way to interpret this widespread report of survey data:

John Edwards has clawed his way into contention to win Iowa's caucuses Thursday in the first vote for the Democratic presidential nomination, according to a new McClatchy-MSNBC poll.

Innocent way first: Writers and editors handling campaign coverage for McClatchy and its member papers are less concerned with what survey results might actually indicate than with whether they can be woven into a dramatic, culturally congruent story line. (That's sort of how some of our more suspicious neighbors look on journalism in general, and given that we keep passing them the ammunition, it's hard to blame them.) Look back a few weeks -- "Whipsawed by an increasingly heated campaign, Democrats in Iowa and other early voting states are closely divided over which of their three top candidates to support, according to a new series of polls for McClatchy and MSNBC" -- and you could easily form the impression that the main function of survey data is to let writers use really perky verbs and lots of adjectives.

And the less innocent interpretation? Since the two papers represented above have both declared that they're going to stack their coverage toward the home-state candidate Edwards ("The bias is deliberate," as the N&O's public editor so gently put it), they're deliberately cooking the results to reflect their ideological interests.

The innocent interpretation is more likely. But neither is especially good for the long-term future of newspaper journalism, and if you don't want people to think the worst of you, why make it so blinking easy?

Now's the time to change the channel if you don't want to hear the usual complaints about misinterpretations of perfectly innocent survey data. We'll try to keep it short, and let's start by pointing out two things that McClatchy did right: It compared identical sample sizes drawn from the same population, thus avoiding the main evils of the "RCP Average" (which, basically, adds two apples to four oranges and expresses the result in pairs of kiwifruit).

That's a nice start, and it points to some interesting conclusions -- just not to the ones McClatchy draws. And it helps shoot down the lede without having to invoke confidence intervals or anything like that (sinners, why persisteth thou in reporting sampling error without thou also reporteth the confidence level, knowing how it doth yank the almighty's chain when you do?). Here's how the same writer summarized the findings of Dec. 6-8 version of the poll:

The surveys suggest that the Democratic race is so close — and hangs on the candidates' resumes and personalities as much as it does on issues — that any of them could win or finish third.

Which, conveniently, is true as far as it goes. But if it was true then that Edwards "could win," how could he have "clawed his way into contention to win" in the new survey?

Anyway. To keep from getting too bogged down in yet another rant about how best to phrase interpretations of "margin of sampling error," let's just note that lots of things make this survey interesting -- just not the things mentioned in the lede. Basically, the data suggest that the only significant change on the Democratic side between Dec. 6-8 and Dec. 26-28 is that the McClatchy graphics desk has learned how to spell "Barack." What "survey says" isn't that anybody has clawed his or her way into anything. A better interpretation of the data suggests that whatever clawing and hissing might have gone on, it's produced no change that we can distinguish from the normal random workings of chance. And even if it doesn't let you crank up your vocabulary to McClatchy pitch, that's interesting.

The significant changes on the Republican side are cool too (raising the question of whether the emphasis on nonsignificant movement on the Democratic side is the result of chance or of a pro-Democrat bias at McClatchy):

Taken together, this first poll in Iowa since campaigning resumed after a Christmas break showed a dead heat among the three leading Democratic candidates and a volatile clash between the two top Republican rivals there.

And how does the Dec. 26-28 survey reveal a "volatile clash" on one side but not on the other?

Huckabee's support dropped 8 percentage points* since the last McClatchy/MSNBC poll Dec. 3-6.

A major reason is that he has come under sharp criticism from rivals such as Romney, been blistered as a tax raiser in a $500,000 ad campaign aired by the anti-tax group Club For Growth, and faced new scrutiny by the media of his Arkansas record on such issues as pardons.

Couple problems here. The Huckabee change is starting to look big (not necessarily significant; at 95% confidence, in a sample this size, 23% and 31% could both be non-chance reflections of 27% support in the whole population, for example). It's worth mentioning, with a caution that it falls short at conventional confidence levels. What we can't do -- surveys only measure what they measure -- is say why the change is taking place. The bit about "sharp criticism" and fresh media scrutiny isn't "poll says," it's "writer guesses." It might be a good guess and it might not, but it doesn't have anything to do with the survey results. It's fine to point out the correlation, but it's dishonest to proclaim a cause.

But we're missing some more interesting stuff too. McCain's support has grown significantly, and there's a significant decline in the number of Republicans plumping for "undecided." Both those are worth talking about, even if they don't fit the conventional story line.

Finally, somebody needs to be chastised for this:

Poll: Bhutto doesn't add to terror issue

DES MOINES, Iowa -- The assassination of former Pakistani Prime Minister Benazir Bhutto did not raise the profile of terrorism as an issue in the U.S. presidential campaign in Iowa, according to a new McClatchy-MSNBC poll.

"People are still voting on what they were voting on a week ago," said Brad Coker, pollster for Mason-Dixon Polling & Research.

Democrats still rank terrorism a very low priority. The survey found 5 percent of Iowa Democrats calling it their top concern, up from 1 percent earlier in the month, but with virtually no change during three nights of poll calls that started the evening before the assassination and continued the next two evenings when news of the murder dominated national media.

Aaaaaaaaaaaaaaaaaaaaaaaaaagh. One, you can't compare the priorities of Democrats and Republicans because you're not asking them the same questions. Two, the poll doesn't address whether the assassination "raised the profile of terrorism as an issue" (2A, it was already in the field for a day before the assassination, and 2B, the poll doesn't measure the "profile of terrorism as an issue"); that's a guess you can attribute to the guesser, but not to the data. Three, you're not measuring whether Democrats rank terrorism as a low priority. You're only measuring whether it's their highest priority. If (among the likely-attendee Democrats with land lines who were called during the second and third nights of the survey and had heard of the assassination) the assassination had moved "terrorism" up to the second priority for half your voters, it's become a lot more salient, but you have no way of knowing it.

We're always open to good arguments for less attention to polls, and this particular poll is one. But as long as we're going to spend the time and the space -- and the not-inconsiderable amounts of money it costs to run a random telephone sample in the era of the 20% response rate -- the least we can do is be honest with the data. McClatchy is failing at that, and member papers are, at the least, guilty of failing to insist on fundamental accuracy. Both camps need to shape up. Particularly if you've already declared a bias, try not to compound it with cluelessness.

* 9 points, according to the earlier report, but who's counting?


