Blog post

Big Data: Rumsfeldian Thinking and Carl Sagan’s Camera

By Jack Santos | November 12, 2012 | 0 Comments

“There are known knowns; there are things we know that we know.
There are known unknowns; that is to say there are things that, we now know we don’t know.
But there are also unknown unknowns – there are things we do not know we don’t know.”

                                                                                                      — Donald Rumsfeld

I was reminded of this quote when reading about how Carl Sagan fought for cameras on early planetary probes – and lost. His thinking, ostensibly, was about answers for the questions we weren’t asking – can’t even imagine or think of. And defining new questions.

He lost his argument, but only for the first few Venus probes. Every spacecraft from then on has had a camera. The benefit in terms of scientific results, and public engagement, has more than made up for the cost.

I am reading a draft of a Big Data paper that my colleague Svetlana Sicular has written – to be published soon (on Guidance for Big Data adoption).  It occurred to me that we are in those same stages with Big Data. If we are driven by our urges for certainty, by metric driven, know all the questions search for answers, we are bound to miss the big picture, and destined to avoid important answers – and business affecting – discoveries that answer questions we don’t even know to ask yet.

It’s like that right now – the explosion of data from not only watching and measuring the actions of people on the internet (shopping, news reading, facebook), but also the explosion from “the internet of things”.

I personally had to deal with that explosion in a hospital setting as a CIO. The question was about what constitutes a medical record versus a personal health record – what was “hospital” data, and what was “health history”. At the time, the massive amounts of bedside instrumentation data that was just beginning to find its way onto networks (usually a non IT network, but in a related “clinical engineering” network – another IT vs Operational Technology (OT) issue) was viewed as something we probably don’t want to store for a significant period of time, or even provide patient access to. It was questionable as to whether it was part of the “electronic medical record”, much less the “patient health record”.

How wrong that point of view was, and is.

And munging (that’s a technical term) through that data, as an individual, a clinician, or as a researcher, is like tunneling through a gold mine. Big data is struggling with how to define the questions.

What is the metaphorical equivalent of “Carl Sagan’s camera” for Big Data? We don’t know what we don’t know.

Comments are closed