One of the mysteries I am planning to explore in my research on using big data approaches for security is this: why so many surveys and media reports seem to show (no links here!) that 20%-40% of organizations utilize big data approaches for security today, while in reality this is not the case – by a long shot.
Let’s see. Here is the canonical definition of “big data”:
“Big data” is high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. (source)
Notice something interesting: the 3Vs are described as volume, velocity AND variety! If you have a small pile of variable data, say, 10Mb of it, we are definitely not in a big data realm. A huge RDBMS of structured (not varied) records is not big data either. The idea is AND, not OR!
On the other hand, see how some other people define big data and “big data tools”:
Sorry, guys, but this is SECURITY IDIOTICS, not security analytics. A reality of using big data for security is much more rare – and much more precious….
The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.
Comments are closed
I could not possibly agree more! Part of the problem is how many completely unrelated technologies have abused Big Data terminology in their marketing. Excel is apparently Big Data. SIEMs are apparently Big Data. Log Management is apparently Big Data. Anything involving a backend cloud feed is apparently Big Data.
Velocity is the least understood, or applied, aspect of those V’s in security. It implies *streaming* data, and a necessity for *streaming* analytics, not evolution of content, or some other marketing nonsense.
I’d go so far as to say that if it’s not using machine learning pervasively, it’s not Big Data analytics. That’s over-restrictive, for sure, but it cuts out a bunch of Excel / R / Queries & Searches / Manual Manipulation that just doesn’t belong in the category!
I couldn’t make out all the fine print, but that chart may as well add Google Search if they are going to include Excel.
Like most things, what is “Big Data” is in the eyes of the beholder. I always go back to what one is trying to accomplish, not the tools used. In the context of Anton’s blog, what organization’s are assumedly trying to accomplish in the category is the distillation of enterprise scale flows of data (log, events, network traffic, threat intelligence, vulnerability information, identity information etc…) from which security relevant anomalies can be detected, prioritized, investigated, understood, and remediated before the bad guys are successful. Our argument at RSA is that this isn’t possible with traditional SIEM tools, let alone Excel.
>what is “Big Data” is in the eyes of the beholder
Dude, sadly, you just proclaimed yourself to be part of the problem.
Try these for size:
– “what is a car is in the eyes of the beholder”
– “what is black and white is in the eyes of the beholder”
– “what is a program is in the eyes of the beholder”
IT terms should have specific definitions – and so does big data. I can call your golf cart a car, but it won’t magically become that since it is “in the eye of the beholder”…
Based on their definition, Google is definitely big data 🙂
“IT terms should have specific definitions – and so does big data”
Anton – that should be ‘and so should big data’. In the meantime the vendors and marketeers are free to run their grubby little hands all over it as per Rob’s comment at the top. BD went mainstream with TV documentaries (BBC’s Horizon) and adverts on TV and the marketeers are having a field day. Substitute the word Big with Medium or Modest and you can see why they won’t let go.
Couldn’t agree more though, especially ‘Excel’….
Yes, of course you are right: and so SHOULD big data
Hopefully we’d arrive at a precise definition soon – and will leave the media arguing about their fuzzy definition.
I do not even know how I ended up here, but I thought this post was great. I don’t know who you are but certainly you are going to a famous blogger if you aren’t already 😉 Cheers!|