Blog post

If You Torture The Data Enough, It Will Confess Anything

By Andrea Di Maio | June 29, 2009 | 1 Comment

open government data

A former colleague of mine, Paolo Magrassi, used to say this when challenging benchmarking reports or statistics or quantitative analysis. His words came to mind after I read Casey Coleman’s latest post on her blog. I like her blog as she provides an authoritative, yet informal view of events that affect the federal government. In her latest post she mentioned Vivek Kundra’s vision of government “open by default”, i.e. data from federal agencies should be published and publicly available, unless privacy and security considerations prevent from doing so.

In my blog I’ve covered the theme of open government data many times, since when the US President issued his memo setting the scene for this.

There is one point that Casey makes and I do not entirely agree with. She says that open by default will remove the usual problem of selection bias (i.e. the self selection of individuals who decide to participate in a survey or an experiment) by allowing all American citizens to actively participate in mashing up their own data in ways that they determine. I do agree in principle, but I am more skeptical about many Americans (or Canadians, or Australians or French or….) having a compelling interest for doing their own mashups. In most cases, mashers will be businesses, other government agencies, associations, communities, who have a purpose for mashing up.

And what could that purpose be, if not to prove a particular point or sell a new product or service, or influence the public opinion on certain topics? Isn’t this something that political parties, unions, industrial associations, consumer advocacy groups, the press have always been doing? Sure, open by default will empower more actors to leverage data, but at the end of the day they won’t do so for fun, but for a “business” interest of sort.

Do not get me wrong. I think that open by default is great and, living in a country where many media are directly or indirectly controlled or influenced by government, I’d love to see somebody pursuing an open by default policy.

I just think that people need to set their expectation right and accept that, as Paolo said, if somebody tortures the data enough (open or not), it will confess anything.

Comments are closed

1 Comment

  • Nick Jones says:

    I agree Andrea. I would much rather have a government that was trustworthy and reliable and got on with the job of administering the country in an effective an unobtrusive manner. I have better things to do than mash up government data, it would be great if I didn’t have to worry about the fine grained details of govermnment, and my government only bothered me when a significant strategic decision had to be made. Cynically I suspect the main driver for individual citizens to mash up government data is because they need to monitor governments which aren’t reliable enough to be left alone to do their job. Which is rather sad. I guess that “open by default” is a good idea, but I wish it wasn’t so necessary.