This is my article, published in Forbes last week.
Volume, velocity and variety characteristics of information assets are not three parts of Gartner’s definition of big data, it is part one, and oftentimes, misunderstood. Most people only retain about one-third of what they read — that explains the truncation. However, to get to the essence of the definition, an effort to comprehend and retain more than what is limited to a single tweet is well-advised even in our fast-paced time. Especially given that Gartner’s big data definition is not much longer than a tweet:
“Big data” is high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.
The definition consists of 23 words, 181 characters with quotation marks. The latter is a hint that Gartner believes “big data” will be the new normal in the very foreseeable future. I also like that this definition reflects relativity of big data. I use it in many dialogs with my clients not just to set a common ground, but to point out where big data challenges and opportunities are. This is how I usually explain it.
Part One: 3Vs
Gartner analyst Doug Laney came up with famous three Vs back in 2001. In 2011, Gartner has identified twelve dimensions of data management — all of which interact with each other and confound each other. We have four dimensions of Management & Control and four dimensions of Qualification. The three V’s are the driving dimensions of big data Quantification (there is a fourth too).
The most interesting of 3Vs is variety: companies are digging out amazing insights from text, locations or log files. Elevator logs help to predict vacated real estate, shoplifters tweet about stolen goods right next to the store, emails contain communication patterns of successful projects. Most of this data already belongs to organizations, but it is sitting there unused — that’s why Gartner calls it dark data. Similar to dark matter in physics, dark data cannot be seen directly, yet it is the bulk of the organizational universe.
Velocity is the most misunderstood data characteristic: it is frequently equated to real-time analytics. Yet, velocity is also about the rate of changes, about linking data sets that are coming with different speeds and about bursts of activities, rather than habitual steady tempo. It is important to realize that events in data arise out of the available data and that available data forms its own “social network”. This means that some data serves as a “canary”, other data influences and yet more data results in decisions. When the temporal relationship between two or more data sets changes (more data suddenly becomes less data), then everything else changes, even the definition of a “data event”.
Volume is about the number of big data mentions in the press and social media. I contribute!
Part Two: Cost-Effective, Innovative Forms of Information Processing
This picture illustrates a typical situation when all problems are labeled as big data problems.
To sort out what can indeed be solved by the new technologies — and this is not one technology — apply part two of our big data definition. Think about technology capabilities to store and process unstructured data; to link data of various types, origins and rates of change; and to perform comprehensive analysis, which became possible for many, rather than for selected few. Don’t expect inexpensive solutions, but expect cost-effective and appropriate answers to your problems.
One of my clients even asked about “big processing of small data.” That counts.
Part Three: Enhanced Insight and Decision Making
Part Three is the ultimate goal. Business value is in the insights, which were not available before. Acting upon the insights is imperative. Missing part three is the most laborious and painful path to the bottom of the Trough of Disillusionment in the Gartner Hype Cycle, especially when parts one and two are present. Other paths to the Trough are also thorny, but necessary on the way to the Slope of Enlightenment.
I tell my clients that their main goals for now are to learn how to identify and formulate big data problems, and to grow their own skills and experience with big data technologies, while these technologies are evolving and maturing. Good solutions are possible although not easy. Just this week I have had several briefings with the companies that deliver unique innovations. These innovations combined represent the power of big data. May its force be with you.
Additional analysis is available in the Gartner Special Report, “Big Data, Bigger Opportunities: Investing in Information and Analytics” at http://www.gartner.com/technology/research/big-data.
Follow Svetlana on Twitter @Sve_Sic