Blog post

Security Analytics: Platform First or Content First?

By Anton Chuvakin | September 06, 2017 | 10 Comments


Other security bloggers write posts of general interest to the community (like posts on why “security ROI” is shit which reminds me of my 2007 post on the same topic or posts on how MalwareTech is doing), but I am sticking to esoteric detection engineering and security operations stuff because… I dunno…. it is just more fun for me. Furthermore, I feel like I already spouted a lot of broad generalities on the fate of infosec in the past and our beloved domain of work (NEW!! Now with cyber!) is such a Groundhog Day anyway (patch faster or malware will get you – circa 2013).

So, here is an esoteric debate to have: what is a bigger challenge for large scale security data analysis efforts …. scalable platform or effective detection content?

My buddy Rocky will perhaps disagree (well, I know he will!), but I see too many organizations sitting next to their shiny new security data lakes and contemplating their lack of threat detection. One representative quote was “In 2017, we used Hadoop to build a SIEM of 1998” [just as smart for threat detection, but for sure with better scalability!]. Hence my vote goes to…


All in all, it reminds me of a late 1990s debate about whether a commercial IDS should ship with signatures (no, really, it did happen!). Back then, some really smart people (they know who they are) opined that “best IDS signatures are written by clients who know their own environments well” hence “we just need to ship a NIDS engine and some sample sigs.” Guess what? The vendors led by said smart people all failed, and those who shipped lots of signatures prospered.

As a funny aside, a few weeks ago I saw a press release from a vendor promising “100 million EPS platform” (!). My first reaction? This will create a mother of all data lake failures (a big data “FAILK”? Or [as was suggested by my colleague] “a data FLAILK”?) if they don’t ship detection content to go with it.

Along the same lines, today I usually discourage clients from planning to use general purpose data analytics tools for security. Sure, it can work, but the amount of work is often staggering. Smart detection content is hard, and for simple detection content, you can just buy a box.

OK, so right about here an astute reader will say, “But Anton, you whine that people buy too many boxes, but here you suggest they get another box rather than adapt the tooling they have.” Sure, good observation! However, while some organizations can run super-impressive DIY security analytics efforts, many cannot even operationalize a box they purchased. Indeed, we are talking about different maturity levels of organizations.

In fact, for the majority of organizations, BOTH “scalable platform” AND “smart detection content” are too hard. However, lately I’ve seen too many enlightened organizations that managed to succeed with the scalable platform part to then fail with detection logic ….

Recent posts related to security analytics:

The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.

Comments are closed


  • I can’t help but point out that you’ve asked the classic start-up pivot to failure: “Do we build the most powerful technology and look for a use case or address a use case and incrementally improve the tech?”

    I have never been a fan of “get the coolest tech and figure out what to do with it”, but the past few years have shown that vendors can drown out value discussions because their solutions use Hadoop, Mongo, or [insert Big Data tech here].

    Your data lake is going to resemble 1969 Lake Erie if you just send it everything (faster!) before you know what you want to analyze.

  • Taylor Lehmann says:

    Use cases define data needs.

    Hunt topics define data needs.

    The reverse is should not be true.

  • Content was always the most valuable part of any SIEM deployment. And expertise is indeed rare than ever before these days as many people indeed left SIEM space in favor of UEBA/ML/AI etc. Second most important is time. To create a truly useful SIEM content one needs not only do the coding. What about 1-2 hours it takes to write proper documentation? Tuning and workflow guides? QA & performance test in real SOC environment? Efficiency tracking? Quality updates. I strongly believe that all the “SIEM failure” stories are reality because content was always delivered as consulting gig and guess what happens after consultant leaves the building? The main questions I heard from every prospect and customer back in SIEM sales days was “What use cases do you have? Where can I see them?”. And since every integrator and consulting shop treats their developed SIEM content as crown jewels I never had answer to that questions. It is important to understand there are 2 kinds of SIEM content: company specific tied to business logic and threat-centric aimed at specific threats. First kind is consulting, second kind is a shareable resource! So how do we establish sharing guidelines? Well, for my part I’ve gathered a team of ArcSight, QRadar and Splunk SIEM engineers so we can go beyond SIEM tech limitations and focus 100% on task at hand (threats!). That’s how cross-SIEM use case library was created. Next week we go live with 2.0 version where documentation will be automated as well. Be part of the story: p.s There’s free access for education institutions and researchers.

  • Furkan says:

    What do you think about project Sigma? A project that tries to standartize detection content in SIEMs like Snort does on IDPS, Yara on files.

  • Dominique Brezinski says:

    Within the comments there is a bit of conflation between data and analytics — they are not same and both deeply important. There is a dogmatic adherence to use cases, but I have just as often seen use cases make these projects fail as an over-focus on platform.

    There are four factors for success:

    – data about events in the environment that is semantically sufficient to indicate a broad set of security threats.

    – analytics using the available data that surface indications of the set of security threats relevant to the environment, while keeping the total volume of true and false positives within the work capacity of the security team.

    – a platform that has sufficient capability and scale to implement the analytics over the volume of data.

    – a platform that has sufficient capability and scale to provide ad hoc search/query and exploratory analysis over the volume of data with execution performance suitable for interactivity.

    If the solution does not deliver on all four factors, it will fail to deliver value across the detection-response continuum. Unfortunately there are few people with understanding of what analytic techniques best apply to threat detection, and therefore what data is necessary and sufficient. Many environments over collect the wrong data and under invest in the right data, which forces higher-scale requirements from the platform.

    Common pitfalls:
    – fraud detection and intrusion detection have very little relation in the analytic/data space
    – north-south network monitoring is insufficient to detect many aspects of modern post-exploitation activity and insider threats
    – actionable detection requires determining the entities involved and mapping the appropriate changes that can be made to the entities and the impact on remediating the detected threat. The information and process either needs to be automated or specified in a runbook. Anything else is just making noise for the team with no clear path to risk mitigation.
    – statistical techniques are not generalizable across the spectrum of intrusion detection. ML and other statistical techniques are at best point solutions for sub-problems in detection.

    • Thanks a lot for your super-insight comment and sorry for being a slow-responding slacker here.

      To me, the main gem of your post is this baby here “there are few people with understanding of what analytic techniques best apply to threat detection, and therefore what data is necessary and sufficient” — indeed, people who understand neither platforms nor detection content are most common.

      Still, perhaps I am reading too much into your response, but I think you are ‘voting’ for a primacy of detection content over platform. Right?