Blog post

On “Output-driven” SIEM

By Anton Chuvakin | September 24, 2012 | 4 Comments


Here is a great term I picked from another SIEM literati: “output-driven SIEM.” This simply means deploying your security information and event management tool in such a way that NOTHING comes into your SIEM unless and until you know how it would be utilized and/or presented. Thus, only existing/planned reports, visuals, alerts, dashboards, profiling algorithms, context fusion or whatever other means of using the data can make a SIEM implementer to “open the floodgates” and admit a particular log type into a tool. If a process exists outside of a SIEM tool that will make use of the SIEM data, that qualifies as well. In this model, goals drive security requirements, requirements drive use cases, use cases drive functionality and collection scope. By the way, this model is as well-known and effective … as it is, sadly, uncommon among the organizations deploying SIEM tools today. “Now that we have all this data [and now that our SIEM is very slow], how do we use it?” is much more common….

For example, if your goal is to make it possible to detect when your users abuse access credentials (or when somebody steals their credentials), requirements will call for login-counting correlation rules, user activity profiling as well as associated reporting on user access data. Thus, various types of authentication records (Unix syslog and Windows event logs, access control and remote access server logs, VPN, etc) need to be collected.

Now, this is dramatically different from an approach one should take with broad scope log management, aimed at general system troubleshooting or incident response support. This is where being “input-driven” and getting every possible bit of data in would be admirable. Collect “100% of all logs,” pile them in Hadoop, have them ready for use, etc  works brilliantly there – pick the data now and sort it out later, don’t dwell on choosing collection-time filters. However, doing the same with a SIEM is a great way to turning your deployment into a quivering, jumbled mess of barely performing components and oodles of “crap-ta” (a hybrid of “crap” and “data”, as you can guess). “Big” or “small”, unused data just does not help the SIEM perform its security mission well.

How does such difference matter in real-world deployments?

Every log line going into a SIEM tool “costs” (and sometimes actually costs – i.e. in dollar and not just in computing resource terms) much more than a log line dropped into a log aggregator.  $50,000  for an appliance system that does 100,000 EPS sounds like a great log management price, while SIEM deployments where 100,000 log messages are actually analyzed by a SIEM every second are both rare and really expensive (likely well into 7 digits territory).

Admittedly, “output-driven SIEM” is hard work. It makes soooooo much sense to “just collect it for now” and then “figure out how to use it later.” In many cases, however, this means that your deployment will be stuck. Sometimes it may work for you – but please be aware that for many people who thought that “it would work for them," it actually did not. At this point, it should be obvious to most readers that combining “input-driven” log aggregation and “output-driven” SIEM analysis is still the best way to go for most organizations. And, yes, as with every great useful rule, it has great useful exceptions …

On the architecture side, if your SIEM includes log management components (like most do today), the same logic applies: that aggregator component will see all of the data, while core SIEM analysis components and dashboards will only see the data that needs to be there. For two distinct tools, this “magic” is achieved via filters that are deployed between a log management system and a SIEM.

So, think about using the data before you admit it into a SIEM!

Related SIEM posts:

The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.

Comments are closed


  • Paul Oneil says:

    I’ve argued SIEM’s are often overrated. At least in terms the vendors make where they boast of capabilities that are difficult to implement such as correlated alerting. The burden should be on the vendors to show advancements here. Simply offering up a brute force correlated alert is hardly impressive. Meanwhile, most SIEM deployments of the last few years “break” under intense loads or you have a very small window of live data to work with. IBM research has been developing a new SIEM archtictecture that looks impressive. Wait and see. Otherwise, everything that can be achieved with a SIEM in most deployments comes down to simple thresholding and filtering for whatever you are looking for. Most everything else is overlooked or overrated.

  • Paul, thanks for your insightful comment. Indeed, plenty of people bought SIEM tools and got little value out of it. However, I’d argue (here and in my research) that in many cases customers failed to actually use the tool and had “magic bullet” expectations.

    Indeed, “he burden should be on the vendors to show advancements here”, but some burden is definitely on the users as well.

    “most SIEM deployments of the last few years “break” under intense loads or you have a very small window of live data to work with. ” = most SIEM deployments are simply under-spec’d in terms of hardware 🙁

  • Paul Oneil says:

    Well, let me phrase it another way. Big data… hence, newer SIEM architectural design choices contemplate some filtering of data prior to storage…

  • Sorry, didn’t get the point of the comment. In some recent big data-like security data collection efforts people don’t filter on collection (their thinking is: Hadoop can hold everything). If they know what to do with all the data, that is fine, I guess….