I was reading an excellent GTP piece on big data analytics called “Hadoop and MapReduce: Big Data Analytics” the other day. It is a very useful assessment of big data technologies and approaches. The author is not writing about the security use of Hadoop, but assesses how big data analytics differs from traditional data analysis (OLTP, ODS, OLAP, EDW, BI, CEP alphabet soup) across all domains. The piece would come handy to educate all the screeching big data monkeys – if only they’d be convinced to read a 41 page document….
But I digress. What struck me in this research are these two pictures where he compares the approaches based on data volume and processing complexity:
They have relevance for [attempted] use of big data analytics for security. To understand my point, you only need to know what “CEP” stands for – Complex Event Processing (aka “correlation” in SIEM land). Now look at the right picture: if you think that “SIEM correlation is complex”, you should not even try to come within a 5 mile radius of a Hadoop install since it is MUCH harder. Further, look at the left picture: if you mostly deal with nicely structured data (even if lots of it), big data methods may not be the best.
Keep this in mind the next time somebody tries to sell you a “Hadoop cluster with a Pig on top in place of a SIEM” vision….
Finally, a few insightful quotes from the same document:
- “The tools/techniques, architectural patterns, and use cases where big data analysis is applicable is very much a work in progress.” (therefore you MUST be prepared to experiment, rather than consume)
- “Big data analytics is undergoing huge (at times chaotic) change and innovation. Big data analysis is new and has few best practices; expect to learn through failures and the success of peers.” (therefore, what you know to be true today may not be tomorrow)
- “Installing, configuring, and administering a production-scale Hadoop cluster requires considerable system administration expertise. Interacting with Hadoop requires a detailed knowledge of programming languages.” (if you think SQL is hard, MapReduce will kill ya dead…)
- “Data sampling is dead, and the success of a big data analytics initiative cannot be shown with a small-scale pilot.” (and, no, Excel is not a big data tool)
Just some food for thought….