Andrew White

A member of the Gartner Blog Network

Andrew White
Research VP
8 years at Gartner
22 years IT industry

Andrew White is a research vice president and agenda manager for MDM and Analytics at Gartner. His main research focus is master data management (MDM) and the drill-down topic of creating the "single view of the product" using MDM of product data. He was co-chair… Read Full Bio

Coverage Areas:

Big Data Will Lead to Big False Positives – Lest You Take Account of Trust (or lack thereof)

by Andrew White  |  January 10, 2014  |  3 Comments

I was reading ‘The signal and the noise’ (Nate Silver, 2012) over the holidays and didn’t really grasp the importance of a note I recently published called, ‘Big data governance – from truth to trust‘.  This note should be re-titled, ‘Big data value – from truth to trust’.  Nate Silver explores Big Data several times in his excellent book and in several places; he calls out the somewhat obvious point that with more data, so there will be more variability in that bigger pool of data.  Specifically, with significant growth in data, new theories (and assumptions) of causation (versus correlation) will emerge.  This growth will occur, perhaps, at such a prodigious rate that our testing won’t be able to keep up, and so our ability to improve our understanding (i.e. make better predictions) will fall.  Great caution to keep in mind when we consider the high level of hype associated with big data.

In other words, with Big Data comes Big False Positives.  Thus, as Silver states, “…[T]he number of meaningful relationships in the data – those that speak to causality rather than correlation and testify to how the world really works – is order of magnitudes smaller.”  This is the exact causes (no pun intended) for the shift in emphasis our research note (and blog, From MDM to Big Data – from Truth to Trust) calls out, from a focus in absolute truths toward an understanding, and thus exploitation, of the degrees of trust in that meaning, or correlation.  If truth is no longer possible, trust is at least plausible.  Go Big Trust!

3 Comments »

Category: Analytics Big Data Business Intelligence Decision Making Enterprise Information Management Information Management Information Policy Information Trust Instrumentation     Tags:

3 responses so far ↓

  • 1 Big Data Will Lead to Big False Positives &ndas...   January 10, 2014 at 8:04 pm

    [...] I was reading ‘The signal and the noise’ (Nate Silver, 2012) over the holidays and didn’t really grasp the importance of a note I recently published called, ‘Big data governance – from truth to trust‘. This note should be re-titled, ‘Big data…  [...]

  • 2 sushant   January 14, 2014 at 8:09 am

    Hi Andrew, nice post. You are very right in pointing out the inadequacies of implementing a big data system without giving it a complete thought. Often the requirements for big data analysis are really not well understood by the developers and business owners, thus creating an undesirable product.

    For organizations to not waste precious time and money and manpower over these issues, there is a need to develop expertise and process of creating small scale prototypes quickly and test them to demonstrate its correctness, matching with business goals.

    Following up on this, I came across and registered for a webinar on Deploy Big Data solutions Rapidly in Cloud through Harbinger’s ABC model (Agile-Big Data-Cloud), it looks a promising one http://j.mp/19xJ6ew

  • 3 bhaswar   January 15, 2014 at 5:28 am

    Big data is not for everyone. Period.

    I somewhat agree with the conclusion but not with the reasoning.

    Most businesses will continue grappling with structured data rather than unstructured data. The trends on IT spending reflects this. Predictive models will get richer and more complex resulting in enterprises taking better decisions. So big data for all the hype will not be very big for most businesses anyway.

    Big data models may give false positives mainly due to incorrect models and/or insufficient or wrong data( not so much the false correlations). We still need SME’s to evaluate the decisions from such models. A huge gap I currently see is with the problem definition in Big Data. That gap is only going to widen with IoT and the resulting data explosion.