Tuesday, May 18, 2010

The fear of all sums

All right -- I totally stole the hed from this week's Economist, because it's the best hed I've seen in a while and is far better than the lame effort I was playing with for this entry.

Anyway. Enjoyed the Ben Zimmer language column in the Times magazine this month, which reminds me that I haven't been rushing to the magazine first thing on Sunday mornings to read the language column, which probably means some important part of my weekly quota of political and epistemological train wrecks is being met somewhere else now that the "On Language" column is written by an actual language person. I count that as progress, and I expect to be a more regular reader.

Ben's topic is "quant" and how it got that way: a noun meaning not just people who do quantitative research, but people whose quanthood marks them as the double-naught spies of the financial world. It's a deft, professional and speedy (columnists are shortstops, not chessplayers) look at how a word that you might see in daily discourse came to mean what it does. Given my druthers, I'd like to see him spend some time on the cultural factors that make the Times itself so worshipful of some quanty approaches and so clueless about others, but that's asking a lot of a biweekly column that belongs to someone else.

Intentionally or not, the column is even nicer in context. This issue, after all, declares that "we are all statisticians now" (p. 37) and that for journalists, "the recession has ... made capitalists out of everyone" (p. 50). Fortunately, along with platitudes, it has a column by John Allen Paulos that addresses the importance of putting even the scariest of quant results into their appropriate social as well as statistical contexts.

All of that had me in an appropriate mood for Monday's Freep, which produced this candidate for dumbest statistic of the year:

The Fiesta could save its owners more than $100 a year in fuel costs compared with other subcompacts.

This, mind you, from the auto columnist who just the day before had set out to debunk "myths on fuel economy":
The window-sticker mileage figures are a guarantee of the mileage you'll get

Not even close. How you drive has a massive impact on your mileage. However, the window-sticker figures are the only way to realistically compare fuel economy and operating costs when you shop for a new vehicle.

The numbers are generated in lab tests, so every vehicle is held to the same standard. "Your mileage will vary" as the fine print says, but you can trust that a higher EPA rating will save you money.
What I'd like out of a writer who covers the industry? A little basic curiosity -- enough to ask about what the "lab tests" entail, and what the results look like, and their relationship to the number I see in the showroom. In other words, some reporting that not only helps you relate the test conditions to real life (validity) but hints at the relationship of the sample statistics to real life (reliability). There should be an equivalent of the much-abused "margin of error" here. What is it, and why isn't it a part of reporting about fuel economy?

That gets us off track a little from the cosmic measure of fuel economy introduced in Monday's story, the "dollar per year." What the Fiesta "could save" its owners is, in the story's terms, partly a question of statistical significance:

The Fiesta will lead other subcompacts by a significant margin, however. The Chevrolet Aveo, Honda Fit, Kia Rio, Mini Cooper, Nissan Versa, Scion xD and Toyota Yaris have EPA ratings of 27-29 m.p.g. in the city and 34-37 on the highway.

Not so fast. The city ratings -- 27-29 vs. 29 -- show that in some of those cases, there's no difference at all. How significantly 37 is different from 40 should have some bearing on the overall question, but we're going to have trouble interpreting that without some sort of handle on the average (or test) proportion of city miles and highway miles in our calculations.

And we're still nowhere near addressing the core components of DPY,* or dollars per year: how much does gasoline cost, and how much do you drive? Three years ago, the Official HEADSUP-L Saturn spent most of its time going back and forth on Stewart Road,** and it was fed about once a month. Today, it mostly goes up and down I-75, and it's fed twice a month. It's going about 30% farther per nom, but it has a lot farther to go, so in a short-term sense*** it's a relief that the price has stayed below $3/nom.

Summary? I'm not sure I'd be happy in a world in which we were all statisticians now. (I'm not going to qualify, for one thing.) But I'd be much happier about the future of journalism if people who undertook to use numbers to support their assertions actually paid attention to the numbers/assertions relationship and whether it did what it was meant to do. It's sort of like checking the oil.

* Freep style would probably call for periods: D.P.Y. Ack.
** Honk if you know how to get to the original Manor from Neff Hall.
*** Longer term, we should just tax the hell out of the stuff. My goal is to laugh all the way from campus to the light-rail stop a block from the brewpub.

Anonymous Anonymous said...

Of course, NIST style would write that dimension as "USD/a". Not that I would expect too many of the Freep's readers to know that "USD" is the ISO 4217 symbol for "United States dollar", or that "a" is the official approved-for-use-with-SI symbol for "year".

1:11 AM, May 19, 2010  
Anonymous Anonymous said...

If I might suggest where that figure probably comes from (besides the obvious, i.e., a press release): the Feds regularly publish statistics on the average length of a commute, in miles, for both the whole country and individual CBSAs. Your local MPO has a model which includes the proportion of freeway driving to city driving, with estimated LOS, as well as the distribution of vehicle types on the road; this information is used to evaluate transportation projects for Clean Air Act compliance. Several private organizations publish surveys of gas prices; if you pay them money, you can probably get it broken down by ZIP code or some other useful geography. Then divide the VMT/person/year from your population model by the MPG from your vehicle model and multiply by the price of gas from the survey. Compute the weighted average across all geographies in the market and you have the current cost of commuting. Substitute in the fuel economy of the new car, and you have the predicted cost of commuting; subtracting the one from the other gives you the "savings". I could forgive a paper for not describing the methodology in detail, assuming they did it right.

Of course, you can think up a dozen ways in which they could have done it wrong (starting with "assertion copied unattributed from company press release")

2:38 AM, May 19, 2010  
Blogger The Ridger, FCD said...

An English friend of mine was appalled when she visited and saw the price of gas (about $3 at the time). "No wonder you don't conserve! How could you be expected to?" she said.

9:59 AM, May 22, 2010  

