headsup: the blog: No! Make it duller!

Last sermon of the week on the proper reporting of public opinion surveys. Like the others, it's going to suggest that most poll stories should be made even duller than they look now. There's a good reason for that, and it comes with some implications for how journalism handles science in general and research results from social science (which is what political polling is, after all) in particular.

Today's caution is the concept of the "statistical tie" or "statistical dead heat" (which the AP, quite correctly, recommends against using). It's rooted in a good idea, which is the recognition that results within a particular range are more likely that we'd prefer to reflect nonsignificant differences, but many of them aren't "ties." Here's a convenient example from the AP, reporting on an LAT/Bloomberg poll last month:

Among Republicans, Romney [28%] is in a statistical tie with Giuliani [23%]. ... The Los Angeles Times/ Bloomberg poll surveyed 1,312 New Hampshire voters from Sept. 6 to 10, with a sampling error margin of plus or minus 3 percentage points. It included 618 Democratic primary voters with a sampling error margin of plus or minus 5 percentage points, and 430 Republican primary voters with a sampling error margin of plus or minus 5 percentage points.

OK. The AP gets an A-minus for sweating the details. It loses points for rounding off the confidence interval (as the stylebook says, these ought to be reported to one decimal place) and for not reporting the confidence level. And here, at last, is why that perpetually ignored concept is really, really important: You can make Romney's lead significant with a tweak of the confidence level. First, this commercial.

Whenever you set out to test a hypothesis within a sample:
* Scary advertisements make people more likely to vote for Giuliani than happy advertisements
* This treatment makes cancer stay in remission longer than a placebo
* Violent cartoons make kids act violently
... you do so with the understanding that you might find an effect that isn't there in the population as a whole -- a false positive* (the scary ads work in your sample but not in real life, and so forth). The confidence level is the likelihood of a false positive that you agree in advance to accept. In social science generally, that's 5%, or 95% confidence: One chance in 20 that the result you got came about by chance, rather than reflecting something real in the population. It's an arbitrary standard, like the idea that people are magically old enough to drive on their 16th birthday.

In surveys, it teams up with the confidence interval, or "margin of sampling error," to tell you what a 48-44 result in the hotly contested Crook vs. Liar Senate race really means. Given the 1,300-odd sample above, it means Crook's support in the population as a whole is somewhere between 45.3 percent and 50.7 percent -- except for the one chance in 20 that we missed and it's somewhere else.

Given the smaller GOP sample (430) and its smaller margin of sampling error (4.7 points), what can we say about "Romney 28, Giuliani 23"? A number of things. The figure for the whole population could be Romney 28, Giuliani 23. Or it could be Giuliani 27, Romney 23. Or it could be Romney 32, Giuliani 19. We don't call it a "lead" because it could be any of those.

Now comes the fun part. Turn the confidence level down past 65% you can get the margin of sampling error to 2.4 points -- in other words, Romney's lowest score is 25.6, and Giuliani's highest is 25.4. It's a "significant" lead, in that there are no nonchance cases in which Giuliani could be ahead. But there's one chance out of three, rather than one chance out of 20, that the result came about by accident.

So "statistical tie" overstates things a bit. More likely than not, this survey shows a real lead for Romney -- but it's only a little more likely than not. Meaning:

Among Republicans, 28 percent favored Romney and 23 percent favored Giuliani. The results suggest that Romney would have a lead if all Republicans in the state were surveyed, but the probability that the result came about by chance is too large for the difference to be considered significant by conventional polling standards. (Depending on the numbers, you might be able to use: The results are unlikely to represent a difference that would be considered significant ..., but that's more long division than I'm in the mood for.)

It gets boring, doesn't it? That's a good sign. Surveys, like most forms of social science, produce lots of incremental knowledge and very few breakthroughs.

* For extra credit, name the other three possibilities. This will be on the midterm.

Labels: polls

headsup: the blog

Friday, October 12, 2007

No! Make it duller!

0 Comments:

Previous Posts

Mailbag!

J-stuff

News

Feeds

Language links

References

Public service journalism

The comics section