I was reading an excellent GTP piece on big data analytics called “Hadoop and MapReduce: Big Data Analytics” the other day. It is a very useful assessment of big data technologies and approaches. The author is not writing about the security use of Hadoop, but assesses how big data analytics differs from traditional data analysis (OLTP, ODS, OLAP, EDW, BI, CEP alphabet soup) across all domains. The piece would come handy to educate all the screeching big data monkeys – if only they’d be convinced to read a 41 page document….
But I digress. What struck me in this research are these two pictures where he compares the approaches based on data volume and processing complexity:
|
|
They have relevance for [attempted] use of big data analytics for security. To understand my point, you only need to know what “CEP” stands for – Complex Event Processing (aka “correlation” in SIEM land). Now look at the right picture: if you think that “SIEM correlation is complex”, you should not even try to come within a 5 mile radius of a Hadoop install since it is MUCH harder. Further, look at the left picture: if you mostly deal with nicely structured data (even if lots of it), big data methods may not be the best.
Keep this in mind the next time somebody tries to sell you a “Hadoop cluster with a Pig on top in place of a SIEM” vision….
Finally, a few insightful quotes from the same document:
- “The tools/techniques, architectural patterns, and use cases where big data analysis is applicable is very much a work in progress.” (therefore you MUST be prepared to experiment, rather than consume)
- “Big data analytics is undergoing huge (at times chaotic) change and innovation. Big data analysis is new and has few best practices; expect to learn through failures and the success of peers.” (therefore, what you know to be true today may not be tomorrow)
- “Installing, configuring, and administering a production-scale Hadoop cluster requires considerable system administration expertise. Interacting with Hadoop requires a detailed knowledge of programming languages.” (if you think SQL is hard, MapReduce will kill ya dead…)
- “Data sampling is dead, and the success of a big data analytics initiative cannot be shown with a small-scale pilot.” (and, no, Excel is not a big data tool)
Just some food for thought….
Related posts:
The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.
Comments are closed
2 Comments
Big Data Security Analytics is achievable.
Many of the lessons from the Business Intelligence world have not yet been applied to security analytics.
1) There is a deeper understanding of risk in companies that deal in risk as a day in day out commodity.
2) Organizations of these types are:
– Big Tobacco
– Insurance
– Medical Device Manufacturers
– Pharma
– Financials
3) Companies like these tend to view information security problems as part of a more holistic set of problems and potential solutions in the context of a larger “Information Risk” umbrella.
The possibility of Big Data security analytics is not only achievable, but the concept of an Enterprise Information Risk Management Platform (EIRMP) is a real possibility with integrated BI to help manage risk using more than just security data.
Big Data allows us to enmesh PeopleSoft performance review data with travel system data, coupled with corporate spending and badging system data, along with external attack pattern records, and internal network traffic data, combined with local weather patterns and traffic, for example, all together.
The Information Risk analytics capabilities promised by new data technologies are very promising, indeed.
William, thanks a lot for the insightful comment! Indeed, BI + RDBMS (not big data) is where orgs should probably look first, before they delve into the largely uncharted land of Hadoop et al….