I was reading ‘The signal and the noise’ (Nate Silver, 2012) over the holidays and didn’t really grasp the importance of a note I recently published called, ‘Big data governance – from truth to trust‘. This note should be re-titled, ‘Big data value – from truth to trust’. Nate Silver explores Big Data several times in his excellent book and in several places; he calls out the somewhat obvious point that with more data, so there will be more variability in that bigger pool of data. Specifically, with significant growth in data, new theories (and assumptions) of causation (versus correlation) will emerge. This growth will occur, perhaps, at such a prodigious rate that our testing won’t be able to keep up, and so our ability to improve our understanding (i.e. make better predictions) will fall. Great caution to keep in mind when we consider the high level of hype associated with big data.
In other words, with Big Data comes Big False Positives. Thus, as Silver states, “…[T]he number of meaningful relationships in the data – those that speak to causality rather than correlation and testify to how the world really works – is order of magnitudes smaller.” This is the exact causes (no pun intended) for the shift in emphasis our research note (and blog, From MDM to Big Data – from Truth to Trust) calls out, from a focus in absolute truths toward an understanding, and thus exploitation, of the degrees of trust in that meaning, or correlation. If truth is no longer possible, trust is at least plausible. Go Big Trust!
The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.
Comments are closed
2 Comments
Hi Andrew, nice post. You are very right in pointing out the inadequacies of implementing a big data system without giving it a complete thought. Often the requirements for big data analysis are really not well understood by the developers and business owners, thus creating an undesirable product.
For organizations to not waste precious time and money and manpower over these issues, there is a need to develop expertise and process of creating small scale prototypes quickly and test them to demonstrate its correctness, matching with business goals.
Following up on this, I came across and registered for a webinar on Deploy Big Data solutions Rapidly in Cloud through Harbinger’s ABC model (Agile-Big Data-Cloud), it looks a promising one http://j.mp/19xJ6ew
Big data is not for everyone. Period.
I somewhat agree with the conclusion but not with the reasoning.
Most businesses will continue grappling with structured data rather than unstructured data. The trends on IT spending reflects this. Predictive models will get richer and more complex resulting in enterprises taking better decisions. So big data for all the hype will not be very big for most businesses anyway.
Big data models may give false positives mainly due to incorrect models and/or insufficient or wrong data( not so much the false correlations). We still need SME’s to evaluate the decisions from such models. A huge gap I currently see is with the problem definition in Big Data. That gap is only going to widen with IoT and the resulting data explosion.