Anton Chuvakin

A member of the Gartner Blog Network

Anton Chuvakin
Research Director
1 year with Gartner
12 years IT industry

Anton Chuvakin is a research director at Gartner's IT1 Security and Risk Management group. Before Mr. Chuvakin joined Gartner, his job responsibilities included security product management, evangelist… Read Full Bio

Coverage Areas:

Big Data for Security Realities: Case 1: Too Much Volume To Store aka “Big Data Collection”

by Anton Chuvakin  |  October 10, 2013  |  6 Comments

If you fertilize the field of big data with enough marketing bullshit, something will grow. Well, keep waiting for it :-) Use of “big data analytics” approaches for security seems like THE most “bullshit-rich” area of the entire infosec realm (beating such worthy contenders as APT, DLP, BYOD and, of course, “cyber”). However, there ARE definitely end-user organizations doing it for real (and not just the illustrious crew at Zions Bancorporation).

Part of my research this quarter focuses on assessing the reality of big data for security and providing practical, GTP-style recommendations for enterprises. This post is the first in my “reality files” dedicated to use of big data approaches for information security.

One case that keeps popping up my radar (that is programmed to only scan reality, not the realm of wishful thinking and obnoxious PowerPoint slides) is the case of “too much data volume to store” or “big data collection.” Specifically, this scenario often goes like this:

  1. An organization buys a SIEM for, say, $1,000,000 (admittedly, not that much, as far as large enterprise SIEM pricing is concerned…) and likes it
  2. Quickly, they realize they can only store 14-30 days of data inside the SIEM operational data store (be it an RDBMS or a columnar backend)
  3. They reach out to vendors with a log management RFP, with a requirement to store 3 years of raw data (say, with total volume in high TBs or low PBs)
  4. Soon, the get quotes back – and it is *ANOTHER* $1,000,000!
  5. At this point, the team goes “Darn! We can build it ourselves for 10% of that”
  6. And their Hadoop cluster is born…

Admittedly, they would later face challenges with streaming data from collectors to both SIEM and Hadoop (or through SIEM to Hadoop), linking the systems for seamless drilldown, doing searches (Hadoop grep anybody?), and selectively “structuring” the data from the cluster. Some of these are simply challenging while others are extreme (try picking the right data to process from a huge pile and then being sure that you picked all of what you needed). However, in several cases that I’ve seen the organizations were happy with what emerged since simply knowing that “the data is there” and “they have not paid a ton for it” was comforting for them. Note that in this case the organization uses this system for retention and occasional ad hoc queries (such as during an incident), not for any analytics (but it is usually on the roadmap for some remote future time …)

More “big data for security” reality files coming soon!

Related posts on the topic of big data for security:

6 Comments »

Category: analytics big data security     Tags:

6 responses so far ↓