Gartner Blog Network

On “Output-driven” SIEM

by Anton Chuvakin  |  September 24, 2012  |  4 Comments

Here is a great term I picked from another SIEM literati: “output-driven SIEM.” This simply means deploying your security information and event management tool in such a way that NOTHING comes into your SIEM unless and until you know how it would be utilized and/or presented. Thus, only existing/planned reports, visuals, alerts, dashboards, profiling algorithms, context fusion or whatever other means of using the data can make a SIEM implementer to “open the floodgates” and admit a particular log type into a tool. If a process exists outside of a SIEM tool that will make use of the SIEM data, that qualifies as well. In this model, goals drive security requirements, requirements drive use cases, use cases drive functionality and collection scope. By the way, this model is as well-known and effective … as it is, sadly, uncommon among the organizations deploying SIEM tools today. “Now that we have all this data [and now that our SIEM is very slow], how do we use it?” is much more common….

For example, if your goal is to make it possible to detect when your users abuse access credentials (or when somebody steals their credentials), requirements will call for login-counting correlation rules, user activity profiling as well as associated reporting on user access data. Thus, various types of authentication records (Unix syslog and Windows event logs, access control and remote access server logs, VPN, etc) need to be collected.

Now, this is dramatically different from an approach one should take with broad scope log management, aimed at general system troubleshooting or incident response support. This is where being “input-driven” and getting every possible bit of data in would be admirable. Collect “100% of all logs,” pile them in Hadoop, have them ready for use, etc  works brilliantly there – pick the data now and sort it out later, don’t dwell on choosing collection-time filters. However, doing the same with a SIEM is a great way to turning your deployment into a quivering, jumbled mess of barely performing components and oodles of “crap-ta” (a hybrid of “crap” and “data”, as you can guess). “Big” or “small”, unused data just does not help the SIEM perform its security mission well.

How does such difference matter in real-world deployments?

Every log line going into a SIEM tool “costs” (and sometimes actually costs – i.e. in dollar and not just in computing resource terms) much more than a log line dropped into a log aggregator.  $50,000  for an appliance system that does 100,000 EPS sounds like a great log management price, while SIEM deployments where 100,000 log messages are actually analyzed by a SIEM every second are both rare and really expensive (likely well into 7 digits territory).

Admittedly, “output-driven SIEM” is hard work. It makes soooooo much sense to “just collect it for now” and then “figure out how to use it later.” In many cases, however, this means that your deployment will be stuck. Sometimes it may work for you – but please be aware that for many people who thought that “it would work for them," it actually did not. At this point, it should be obvious to most readers that combining “input-driven” log aggregation and “output-driven” SIEM analysis is still the best way to go for most organizations. And, yes, as with every great useful rule, it has great useful exceptions …

On the architecture side, if your SIEM includes log management components (like most do today), the same logic applies: that aggregator component will see all of the data, while core SIEM analysis components and dashboards will only see the data that needs to be there. For two distinct tools, this “magic” is achieved via filters that are deployed between a log management system and a SIEM.

So, think about using the data before you admit it into a SIEM!

Related SIEM posts:

Category: logging  monitoring  siem  standards  

Tags: logs  security  security-monitoring  siem  

Anton Chuvakin
Research VP
5+ years with Gartner
16 years IT industry

Anton Chuvakin is a research VP at Gartner's GTP Security and Risk Management group. Before Mr. Chuvakin joined Gartner, his job responsibilities included security product management, evangelist… Read Full Bio

Thoughts on On “Output-driven” SIEM

  1. Paul Oneil says:

    I’ve argued SIEM’s are often overrated. At least in terms the vendors make where they boast of capabilities that are difficult to implement such as correlated alerting. The burden should be on the vendors to show advancements here. Simply offering up a brute force correlated alert is hardly impressive. Meanwhile, most SIEM deployments of the last few years “break” under intense loads or you have a very small window of live data to work with. IBM research has been developing a new SIEM archtictecture that looks impressive. Wait and see. Otherwise, everything that can be achieved with a SIEM in most deployments comes down to simple thresholding and filtering for whatever you are looking for. Most everything else is overlooked or overrated.

  2. Paul, thanks for your insightful comment. Indeed, plenty of people bought SIEM tools and got little value out of it. However, I’d argue (here and in my research) that in many cases customers failed to actually use the tool and had “magic bullet” expectations.

    Indeed, “he burden should be on the vendors to show advancements here”, but some burden is definitely on the users as well.

    “most SIEM deployments of the last few years “break” under intense loads or you have a very small window of live data to work with. ” = most SIEM deployments are simply under-spec’d in terms of hardware :-(

  3. Paul Oneil says:

    Well, let me phrase it another way. Big data… hence, newer SIEM architectural design choices contemplate some filtering of data prior to storage…

  4. Sorry, didn’t get the point of the comment. In some recent big data-like security data collection efforts people don’t filter on collection (their thinking is: Hadoop can hold everything). If they know what to do with all the data, that is fine, I guess….

Comments are closed

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.