Although I’ve been blogging on all things information and data for almost three years, a change of continent, change of circumstances and change of employer (in that order) now see me here at Gartner Inc. within the Business Analytics research group. So naturally, any of my musings will also shift over to come out through the Gartner Blog Network.
This first post as a Garter-ite (Gartnerist? Gartner-er?) is actually nothing at all to do with being at Gartner (honest). A conversation this week about the relationships between Big Data, Data Scientists, Analytics and Data Quality put in mind of the Seven Gates of Islamic Hell.
There’s still much debate going on between information management and data analytic types about the role of the Data Scientist, and whether there are enough of them to go around (and the word “unicorn” keeps popping up…). Meanwhile, there seem to be plenty of folks out there calling themselves “Data Scientists” who seem to be advocating a sort of analytical Heaven – “just give me all your data (and money) and I’ll do some digital alchemy and we’ll create an insight into something that we didn’t know before. It’ll be great.”
The idea of heaven might be seductive, but when reality bites, it’s pretty unlikely that any of us will ever get actually there.
I’d suggest that to get to a productive and valuable outcome from analytics, we actually need to go on a trip to Analytic Hell. (Hell at least, for the sort of self-proclaimed Data Scientist Alchemist that I’ve just described, because it requires some clear methodological thinking and there’s no magic involved).
I particularly like the Islamic version of Hell because it’s so specific about the types of sinners who will be punished, and the hellfire that awaits them. There are seven “Gates of Analytic Hell” to pass through:
The first Gate Jehannam (for those who reject obedience): “What is the question we’re trying to answer?”
The second Gate, Ladha (for apostates): “Why do we want to answer the question?”
The third Gate, Saqar (for the non-pious): “What action will we take as a result of answering the question?”
The fourth Gate, Al Hutamah (for the greedy): “Do we have enough information to answers the question?”
The fifth Gate, Jaheem (for the proud): “Do we know if the information that answers the question is fit for purpose?”
The sixth Gate, Sa’eer (for those who do not believe in Judgement): “What rules are we applying to validate that the information is fit for purpose?”
The seventh Gate, Al Haawiyah (for the Hypocrites): “What manipulations will we apply to the data to make it fit for purpose?”
If we can’t answer all of those, in order, then any challenges with structure (or not), format (or not), filtering (or not) and tools (or not) are irrelevant and you shouldn’t be doing anything with technology in the first place. Decisions about whether those are done with relational techniques, SQL, NoSQL, Hadoop, Data Discovery, explicitly modelled or programmatically inferred etc. all follow after-the-fact.
My Circles of Analytic Hell are all about being able to differentiate noise from signal. That’s just another way of saying “data quality”. If a “Data Scientist” doesn’t want to discuss data quality, then they don’t understand data and they’re not a scientist. Alchemists (and unicorns) need not apply. Data quality = data value.
I’ll take Analytic Hell over analytic Heaven anytime.
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.