Tuesday, September 04, 2018

Random babbling about innocent numbers

What's entertaining about the CNN polling analysis on Sunday morning's home page isn't whether it slips on a statistical banana peel or two, but how much of it bears any resemblance to things that actually happen in the political ecosystem. Shall we?

Poll of the week: A new NBC News/Wall Street Journal poll shows 50% of voters prefer November's election to result in a Democratic-controlled Congress compared with 42% of voters who prefer a Republican-controlled Congress.

I don't love the syntax; I'd go with "prefer that November's election result in ..." if some editor insisted that you can't say "want November's election to produce," but  it's not completely out of bounds for "what is your preference for the outcome of this November's congressional election."*

This result is in-line with other polls from NBC News/Wall Street Journal and from other generic ballot polls more generally. 

True so far, despite the irrelevant hyphen in "in-line." Since December in this poll, the result for Democrats has ranged from 47% to 50%; for Republicans, from 39% to 43%. "More generally" is sort of what "other generic ballot polls" means, but that's copy-editing, not reality vs. unreality. Which gets us to the paragraph reproduced above, which needs to be gone through more carefully.

At the most basic level, the generic ballot can be seen as an estimation of the House popular vote. 

Technically ... yeah, that's what the question tries to estimate. Whether it produces as good an estimation as "If the election for Congress were held today, would you vote for [ROTATE] the Democratic candidate in your district or the Republican candidate in your district?"** is another matter, but the sentence is misleading for a different reason. There's no Electoral College for the House; the "popular vote" is the only kind we have.

Given where the generic ballot average is at this point (an 8-point Democratic advantage), there is roughly an 6 to 8 point margin of error in predicting the final vote.

Wrong on multiple counts. By "generic ballot average," the writer may mean the RealClearPolitics average for generic ballot questions, which at this writing is given as 8.4 points. That "average," of course, is not a meaningful number; you could use a weighted mean to smooth out the eight different sample sizes it represents (RCP doesn't), but that doesn't cover the different populations sampled (registered voters vs. likely voters) or the different questions asked. "If the election were held today ..." and "what is your preference for ..." look like they're tapping the same sentiment, but still: You can't average the results of different questions. That's literally cheating.

The so-called "average" doesn't have a "margin of error," and RCP doesn't claim that it does, but if it did, the "margin" wouldn't be affected to that degree by the difference in proportions or by the time represented "at this point."*** Even if there was a meaningful "average," it wouldn't affect the "margin of error" as claimed. The difference in proportions is represented in the numerator of ((p)(1-p))/n, of which we take the square root to get a standard error of proportion. The 50% Democrat result in the NBC/WSJ sample produces a margin of error of 3.3 points at 95% confidence;**** the margin on the Republican result is 3.2. And the "margin of error" isn't related to the "final vote," which doesn't happen until 10 weeks after the survey concluded. We report sampling error because we can measure it, but we can't measure biases like nonreponse or social desirability. Fabricating a "margin" that apparently accounts for biases like those is just blowing smoke.

In other words, Democrats end up winning by double digits or roughly tied with Republicans. That's far from perfect, though is actually more predictive than an average for a Senate race would be at this point.  

If there was a "margin of error" involved here, that's just about exactly the opposite of how it would work. Assuming a 6-point lead and a 6-point margin, nonchance results could include either a double-digit win or a tie -- but those are the far ends of the distribution, and the bulk of findings from repeated samples would cluster around the actual population results and slope downward from there.

I don't know what is meant by "actually more predictive than an average for a Senate race," except that "at this point" has nothing to do with that, either. It may reflect a concern about the disproportion in how many people a Senate seat represents (say, between Montana and California), but individual statewide polls would be very good representations of Senate races -- to the extent that hypothetical questions are predictive. Why an "average" is relevant here ... SQUIRREL!

... The exact margin though is likely to be smaller than we thought it would be at the beginning of the cycle. It may be closer to 6 points than 8 points. And while 6 is still significantly greater than 0, it's much more doable than 8.

OK, I guess we're going to give up on the copyediting altogether.  But to the broader concern, CNN is more or less just making it up at this point. It's fun to make fun of Fox News, because Fox as a whole is so desperately a tool of the Trump wing of the Republican Party, but when it comes to straight-up reporting of survey results, Fox continues to err on the side of accurate and dispassionate. Why CNN thinks it needs to analyze public opinion by producing random, poorly punctuated babbling about innocent numbers is beyond me, but it does suggest that you should use CNN's polling analyses for entertainment value only.


* Yes, the noun modifier should be "Democrat-controlled," but the burnt hand dreads cold water.
** The phrasing is Fox's.
*** The timing could certainly affect the prospective "If the election were held today" question if it was after a primary, rather than before.
**** For convenience, it's common to use .25 as the numerator, because that produces the largest possible margin of error for any proportion from the sample. A 70-30 split on a question would produce a margin of error (95% confidence) of 3 points, rather than 3.3, with the same sample. And always report those confidence levels, kids!

Labels: , ,

0 Comments:

Post a Comment

<< Home