headsup: the blog: Or not (one born every minute)

What's the latest in statistical inference there, Washington Post?

The Republican National Convention begins in Cleveland in 31 days. That means that one month from today, Republicans will (almost certainly) start the process of nominating Donald Trump as their presidential nominee to take on Hillary Clinton in the fall campaign.

That prospect looks increasingly problematic — somewhere between a stone-cold loser and a long-shot gamble. With not only the White House at stake but also Republicans' Senate majority and maybe even their House majority in real peril, the idea of nominating Trump should be cause for a growing sense of panic within GOP ranks.
No doubt they appreciate the advice. But surely there's some evidence, right?

Here's why, courtesy of a chart from RealClearPolitics detailing the polling averages for Clinton and Trump over the past three months.

Oomph.

Trump's numbers shot up in the wake of his victory in the May 3 Indiana primary, a win that effectively sealed his nomination. But as May wore on, Trump's poll numbers not only hit a wall, they began to collapse.

OK. Here are some suggestions if you want your analysis to be taken seriously:

1) Ignore the Real Clear Politics "polling average." It is not a meaningful number.
2) Never write paragraphs like "oomph," whether you understand the results you are describing or not.
3) If you insist on ignoring (1) and (2), at least look at the damn Y-axis.

First things first. The RCP "average" is meaningless for two reasons. One, it averages samples from overlapping populations, which means it can't generalize to any of them. The average above* includes samples of "registered voters" and "likely voters," the latter being a (differentially screened) subset of the former. Imagine trying to average samples of US adults and US adult males: you can't talk validly about what "adults" do, because some of the samples don't include women, and you can't talk validly about what "men" do, because some of the samples do include women.

Two, the "average" ignores the different ranges within which a poll is a good guess. This doesn't mean some polls provide better guesses than others; given a specified confidence level, all polls are presumed to have identical proportions of good to bad guesses (19 to 1, at 95% confidence). It does mean that, all else equal, good guesses from a bigger sample will form a tighter curve than good guesses from a smaller sample. Long story short, if you want to say the average of "about five" and "about nine" is "about seven, probably, depending," that's fine. If you say it's "6.89," you're bullshitting. That really is how it works.

That gets to the importance of the Y-axis, which -- assuming the RCP "average" is a meaningful number, which it isn't -- helps explain when numbers "shoot up" and when they "collapse." Trump's "average" on this chart goes from a little under 41% shortly after the Indiana primary to 43.4% at the end of May, an increase of about 6.4%. It neither "hit a wall" (whatever that means) nor "collapsed" as "May wore on," unless a change from 43,4% to 42.8% in an average of guesses represents a "collapse." At the end of the chart, Trump is about where he was two months earlier, with some intervening movement that might represent a surge, a collapse, or none of the above. Soaring and plunging get a lot less dramatic when you apply some basic visual hygiene to the presentation -- say, starting the Y-axis at 0.

Over the period that caught the Post's attention, Clinton's decline** is more dramatic than Trump's surge; she's down about 8.5% before the plateau you see at the right side of the chart. Here's the Post's explanation:

What's fascinating in the chart above is that Clinton's appeal hasn't surged as Trump's has dipped badly. These aren't people deciding to be for Clinton; it's people deciding they aren't for Trump. The more Trump is exposed to a general electorate — at least in the early days of his time as the presumptive Republican nominee — the less that electorate likes him.

Again: If the proportions were real (and the visual presentation wasn't the sort of thing that "How to Lie with Statistics" warned against 60 years ago), the chart could be interpreted as showing that more people are deciding they "aren't for" Clinton than are deciding they "aren't for" Trump. That would be a risky conclusion itself. For one thing, "if the election were held today, would you vote for ... ?" is a very different question today than it was three months ago. "I'd vote for a rabid ferret before I'd vote for Hillary" is one thing when the choices are presented in a 2x12 matrix. It's quite another when the choices are down to Clinton, a rabid ferret and the Libertarian dude.

... The problem for Republican elected officials who worry that Trump's tanking numbers might — if they continue to plunge — cost the party much more than the White House is that they have no other options.

Given that Trump's numbers are neither "tanking" nor "continuing to plunge" (if you select the 3-month view of this race at RCP now, you'll see them turning upward, because an earlier poll appears to have been added to the mix***), the problem is actually quite different. The Post has been going straight at the preposterous Trump candidacy, which is a bold and outstanding public service that has obviously incurred some counterbattery fire. The Post needs to stop shipping ammunition to the Trump campaign, which it does when it fabricates stupid conclusions from bogus statistics, because shipping ammunition to the people who are shelling you is a fundamentally stupid proposition.

Does the Post really want to mess with the enemies of freedom? How about a style rule requiring every BENGHAZI!!!!! story from now through the election to note that Fox News itself blamed the Benghazi attack on an obscure Internet video? That's actually true, and it doesn't leave you open to accusations of card-stacking that arise from getting in over your head with data.

* Many RCP "averages" also include samples of adults, which makes things even less clear.
** In case we need to say it again: "Clinton's decline as expressed with fictional numbers on a deliberately misleading chart."
*** Right, that means the X-axis is also not a linear representation of what it claims to represent linearly. Have we mentioned yet that the "RCP average" is a joke?

Labels: editing, polls, washington post

headsup: the blog

Sunday, June 19, 2016

Or not (one born every minute)

0 Comments:

Previous Posts

Mailbag!

J-stuff

News

Feeds

Language links

References

Public service journalism

The comics section