Blog post

SIEM Real-time and Historical Analytics Collide?

By Anton Chuvakin | July 30, 2014 | 2 Comments


SIEM technology has evolved to a point where conflicting requirements are starting to tear it apart – and I am not the only one to observe that. See here:

  • Just as at its birth in the late 1990s, today’s SIEM must excel at real-time analysis using rule-based correlation and other methods and analyze thousands of events per second streaming from the collectors in order to detect threats affecting the organization.
  • At the same time, SIEM is expected to execute searches and interactive queries, as its users go through historical data, match indicators and run algorithms to extract values from stored pools of data.

For years, the dirty truth of SIEM was that most installations stored log data for 7-14 days only inside a SIEM. This limited SIEM’s mission primarily to the first point above – real-time and short-term analysis inside a SOC [short-term historical analysis over, say, 7 days of data is indeed very useful – but does not solve all the same problems as a multi-month one]. Sure, you can reload older data (yuck!) or peek into a connected log management tool that has much more data, but lacks the analytical brain powers [well, unless you build them yourself]. Thus, if you want to go longer AND analyze the data (a key point!), your choices are:

  1. Buy more SIEM at an obscene cost; some vendors’ technology will scale, but your wallet will not. Economic DoS strikes back?
  2. Use log management with limited analysis capabilities (indexed search and eh… actually, that’s it sometimes). New hope?
  3. Build or procure some other tool (big data something or other). The return of BDSA?

One enlightened fellow, upon reading my recent SIEM Evaluation Criteria document, noted that in his view, the criteria are too biased towards real-time, traditional SOC monitoring usage of SIEM at the cost of historical, long-term analytics. Despite the fact that historical algorithms, data exploration and profiling are featured in the report, it is indeed so. SIEM has evolved as primarily a monitoring technology, with investigative use and historical analysis often present, but in an auxiliary role at best. In essence, we have REAL-TIME ANALYSIS (via SIEM) and HISTORICAL AGGREGATION (via log management tools, ELK stack, etc).

And now, many organizations are flocking towards hidden/persistent/advanced threat discovery and longer-term profiling that calls for longer retention and stresses the data stores with queries that are both wide and deep. For example, read this enlightening thread on SIEM, log management and analytics. “Searching the last “N Days” [especially for large values of “N” – A.C.] of logs is much different than alarming and alerting on logs as they come in – they are very different” is a representative quote. However, while searching over 180 days of data will kill a SIEM [assuming merely having 180 days of data in it hasn’t killed it], actually running algorithms (profiling, clustering, rule learning– other stuff I mentioned here) will be much worse. Back in the day when I was doing it, my not-too-sophisticated profiling computations ran overnight over a mere week of data [and I used RDBMS, since nothing else was around in 2004] …

Let’s think together about how to balance SIEM’s dual mission today? Please treat this table as more of an “incomplete thought” rather than a research product, BTW.

Real-time and near term analysis Historical analysis
Object of analysis Stream of data or a small puddle of data A huge pile of data
Storage Short term (a few days) Long term (months to years)
Data Usually structured – logs after normalization May be unstructured- raw logs, indexed
Analysis types Mostly known patterns, statistics on data fields Mostly interactive exploration and models
Common performance bottlenecks Process streams: memory, CPU Store and query: storage, I/O
Focus Detect threats Discover threats
Usage Utilize found patterns for alerting Learn about patterns of data

(also see this table to better understand the difference in usage)

Still, SIEM can actually benefit from its duality; some organizations mine the historical data and then create rules based on patterns that are revealed by algorithms. Others create alerts based on what their analysts have dug out during their threat hunting activities. In the past, I always voted for “first log management, then SIEM”, but now with increased focus on historical and longer-term analysis this may change to “log management –> SIEM –> long-term analytics” or even “log management –> long-term analytics –> SIEM” Let’s think about the choices then:

  1. Want to collect the data and keep it for incident response/compliance? Get log management (commercial or OSS)
  2. Want to set up a SOC and real-time alerting and monitoring, make analyst workflows better? Get a good SIEM (ideally, you should have log management by now)
  3. Want to dig deep into historical data analysis over longer term, match indicators and explore the data? You are in the big data territory now, and are mostly on your own in regards to tools.

There you have it! It came our as a bit of a ramble, but – what the heck – this is a blog, not a research paper 🙂

Select recent SIEM blog posts:

Comments are closed


  • (hugs self, makes squealing sound, taps feet)

  • @Kent 🙂 I’ve been making those squealing sounds for a while too…