### Polling sins: No, no and don't

Sigh. In order, more or less:

1) Polls don't have "error rates." That figure isn't a rate, and it doesn't count errors. (Your fielding average actually does produce an error rate, if you subtract it from 1 and move the decimal point around.)

2) You can describe the figure as "percentage points" if you want, but "4.9%" is simply wrong. Take some finding in any poll that reaches 50% -- "do you approve or disapprove of the job Rudolph is doing as lead reindeer?" -- and add 4.9% to it, and you'll get 52.5%. Should Rudolph poll at 10% approval, adding 4.9% would give you 10.5%. The "margin of error," on the other hand, applies equally to any point in the distribution. That's what the next sentence means, but ...

3) Adding and subtracting percentage points is not "statistically adjusting" a finding. It's "addition and subtraction." Third-graders can do it. It really is that simple.

These things are relevant because we're approaching the season in which writers are more and more tempted to babble profoundly about stuff they know very little about -- to wit, the relationships among survey findings, public opinion and political behavior. The grownup wing of journalism -- the part that's supposed to help active, informed citizens carry out their democratic duty -- handles numbers and their related concepts with a blithe cluelessness that would be a firing offense on a competent sports desk. Please, for the children -- cut it out.

First off, we should probably just throw the term "margin of error" overboard. (If you think your audience will panic at "confidence interval," we could try "margin of confidence" as a comfortable transition until things calm down a little.) If you must use it, remember its middle name -- "margin of sampling* error" -- and always, always report the confidence level at which it's calculated. The 4.9-point figure noted above is calculated at 95% confidence, meaning that 19 times out of 20, any sample proportion will be within that distance of the actual figure representing the population at large. You can turn the "margin of error" up or down as you will; if you accept a confidence level around two-thirds, it becomes 2.5 points for a sample of the size described here. That really is how it works.

Does that require a statistical adjustment? Not if you do it right. Sampling always involves a little bit of spit and duct tape; we're trying to generalize from some of a population to the entire population, and if we got a proper random sample, the guessing can be fairly precise. For this sample (n = 400 "Detroit residents"), we can be 95% confident that any sample is within 4.9 points of what all "Detroit residents" really think. One time out of 20, it won't.

Long story short, it's easier than it looks. You can tell your audience how it works, or you can bullshit your audience. It's that simple, depending on how much you don't want to be caught bullshitting your audience.

* Sampling error is a kind of error we can easily quantify. Things like question design error and response bias are far more challenging than one equation in a stats textbook can solve.

1) Polls don't have "error rates." That figure isn't a rate, and it doesn't count errors. (Your fielding average actually does produce an error rate, if you subtract it from 1 and move the decimal point around.)

2) You can describe the figure as "percentage points" if you want, but "4.9%" is simply wrong. Take some finding in any poll that reaches 50% -- "do you approve or disapprove of the job Rudolph is doing as lead reindeer?" -- and add 4.9% to it, and you'll get 52.5%. Should Rudolph poll at 10% approval, adding 4.9% would give you 10.5%. The "margin of error," on the other hand, applies equally to any point in the distribution. That's what the next sentence means, but ...

3) Adding and subtracting percentage points is not "statistically adjusting" a finding. It's "addition and subtraction." Third-graders can do it. It really is that simple.

These things are relevant because we're approaching the season in which writers are more and more tempted to babble profoundly about stuff they know very little about -- to wit, the relationships among survey findings, public opinion and political behavior. The grownup wing of journalism -- the part that's supposed to help active, informed citizens carry out their democratic duty -- handles numbers and their related concepts with a blithe cluelessness that would be a firing offense on a competent sports desk. Please, for the children -- cut it out.

First off, we should probably just throw the term "margin of error" overboard. (If you think your audience will panic at "confidence interval," we could try "margin of confidence" as a comfortable transition until things calm down a little.) If you must use it, remember its middle name -- "margin of sampling* error" -- and always, always report the confidence level at which it's calculated. The 4.9-point figure noted above is calculated at 95% confidence, meaning that 19 times out of 20, any sample proportion will be within that distance of the actual figure representing the population at large. You can turn the "margin of error" up or down as you will; if you accept a confidence level around two-thirds, it becomes 2.5 points for a sample of the size described here. That really is how it works.

Does that require a statistical adjustment? Not if you do it right. Sampling always involves a little bit of spit and duct tape; we're trying to generalize from some of a population to the entire population, and if we got a proper random sample, the guessing can be fairly precise. For this sample (n = 400 "Detroit residents"), we can be 95% confident that any sample is within 4.9 points of what all "Detroit residents" really think. One time out of 20, it won't.

Long story short, it's easier than it looks. You can tell your audience how it works, or you can bullshit your audience. It's that simple, depending on how much you don't want to be caught bullshitting your audience.

* Sampling error is a kind of error we can easily quantify. Things like question design error and response bias are far more challenging than one equation in a stats textbook can solve.

Labels: .War on Editing, clues, polls, statistics

## 2 Comments:

The "margin of error," on the other hand, applies equally to any point in the distribution.It really doesn't. The 'margin of error' as quoted is the 'maximum margin of error', the half-width of a 95% confidence interval at 50%. At 10%, the half-width is about 2.5 percentage points, and at 1% the confidence interval is noticeably asymmetric, eg (.3%,2.3%).

Before our last election, I made a detailed table for New Zealand journalists, though just for sample size 1000.

You're right, and thank you.

The maximum margin of sampling error is calculated with the numerator at .25, which would reflect the sort of 50-50 split that rarely happens in real life. In the proverbial Crook vs. Liar campaign, we're discussing at least three proportions: Crook vs. not-Crook, Liar vs. not-Liar, and Other vs. not-Other. Outside an even 33-33 race, each standard error would be a little different, so for convenience we generally report the margin at the maximum for the entire sample. (If a study doesn't report a different "margin of error" for subgroups, put your hand over your wallet and back away.)

Thanks for the link, and please cojntinue to share the gospel among our antipodean friends.

Post a Comment

## Links to this post:

Create a Link

<< Home