Blog post

Rule Based Detection?

By Anton Chuvakin | April 30, 2019 | 18 Comments

securityphilosophymonitoringdetection

One of the famous insults that security vendors use against competitors nowadays is “RULE – BASED.” In essence, if you want to insult your peers who, in your estimation, don’t spout “AI” and “ML” often enough, just call them “rule-based” 🙂

Sure, OK, we all can laugh at claims of “cyber AI” (and we do, often), but what is the reality layer under this? I suspect there is a spectrum that may be worth thinking about…

First, here is a Snort rule (source):

alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:”PUA-ADWARE Lucky Leap Adware outbound connection”; flow:to_server,established; content:”/gdi?alpha=”; fast_pattern:only; http_uri; content:”|0D 0A|Cache-Control: no-store,no-cache|0D 0A|Pragma: no-cache|0D 0A|Connection: Keep-Alive|0D 0A 0D 0A|”; content:!”Accept”; http_header; content:!”User-Agent:”; http_header; metadata:impact_flag red, policy balanced-ips drop, policy security-ips drop, ruleset community, service http; reference:url,www.virustotal.com/en/file/43c6fb02baf800b3ab3d8f35167c37dced8ef3244691e70499a7a9243068c016/analysis/1395425759/; classtype:trojan-activity; sid:30261; rev:7;)

Nobody sane will deny that this is “rule-based” threat detection; this is a NIDS signature. Same logic applies to tools that do threat intelligence (TI) matching to logs and traffic – even though TI is not exactly signatures.

The defining characteristics of a signature are (I think they are – us people with big egos often forget to add “I think” to positions):

  1. Focuses on “known bad”
  2. Describes specific badness
  3. Names the exact type of badness
  4. Latches on precise characteristics of badness behavior and/or nature (note behavior and/or nature part!)
  5. (anything else I missed?)

Now, how about this example:

IF Application_Protocol = FTP AND Destination_IP_Class=external AND Data_Transfer_Volume > 10MB THEN <ALERT>

Is this a rule? I’d say this is a rule, but probably not a signature. I think the essential characteristics are:

  1. Focuses on expected badness, but perhaps not on exact “known bad”
  2. Latches on broad characteristics of badness behavior and/or nature

Latching onto the precise nature of badness is gone.

OK, how about this?

IF Application_Protocol = FTP AND User_Group=admins AND Data_Transfer_Volume > 2*(Average_User_Peers) THEN <ALERT>

Still a rule, eh? The last example has referenced a metric Average_User_Peers that is presumably based on a running average (what we used to call it; now they just call it machine learning…). To me, the above is a rule, a pattern, or perhaps a rule-with-a-caveat. It is clear that we enter a fuzzy territory here. Purists start to cringe. Cyber AI appears.

What about a robot-written rule? Say some unsupervised ML logic reveals that FTP data transfers of larger than double the average among user peers are 77.3% likely to be malicious? We are well in a fuzzy territory here! Purists freak out. Cyber AI frowns at you. Is a algorithm-written rule a rule? Now we enter the very philosophical core of the fuzzy territory…

Finally, what about a supervised ML classifier trained on a vast corpus of badness (naturally, all “known bad”, by definition) and goodness. Few would claim this would be a rule, but admittedly this is related to “know bad” in some way, no? Cyber AI smiles at you.

I think the essential characteristics here would be:

  1. Focuses on badness similar in some mathematically measurable way to “known bad”; this is “derived from known bad”, rather than “known bad”
  2. Latches on characteristics of badness behavior and/or nature visible to an algorithm, but perhaps not to a human.

As we ponder further, another way to look at this is perhaps:

Threat type Method that works Method that doesn’t
Known known Signatures, supervised ML N/A
Known unknown Rules, supervised ML Signatures
Unknown unknown Praying 🙂 Rules and signatures

Dragos excellent treatise on four detection types (“The Four Types of Threat Detection”) elegantly differentiates between “indicators” (here in this post called signatures) and “Threat Behaviors.”

The latter may ultimately be RULES as well, and there is nothing offensive about it. Recalling the Bianco’s pyramid of pain, the rule may apply at any level, from bad IPs and file hashes (very signatures) to TTPs and attacker tradecraft. Sometimes, rule-based [approach] rules!

So.

Lessons:

  • next time you get into a bar fight over “is this signatures or behavior?”, say YES and walk away
  • there is nothing wrong with being rule-based, in many cases
  • rules work at many levels of abstraction, and they are more resilient / less fragile at higher levels
  • (anything else I missed?)

Vaguely related posts:

The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.

Comments are closed

18 Comments

  • Shawn Riley says:

    There are also knowledge engineering derived AI expert systems that can apply both simple IF THEN ELSE conditional statement rules with or without mapping the data to the ontological knowledge models and then after mapping can also apply SWRL reasoning rules based on the meaning of the data to support both ontology and rule based inference from any of the knowledge encoded in the knowledge-base. Great for applying deterministic reasoning to detect adversary behavior like MITRE ATT&CK techniques and then deductive inferences for the adversary objective and stage of the cyber attack lifecycle, the NIST impact, and recommended and taken courses of action.

  • Glenn says:

    Anton, once again you have boiled it down to the basics, whether rule based, AI based or abstract based. Amazed that I can still absorb many of your thoughts as have been digesting them since the 90(s)

  • One other way to think about this: how many signatures would I need to cover a particular threat landscape? How many rules would I need? How many “AI models” would I need? The number of signatures is huge, the number of rules moderate, the number of AI models small.

    • Indeed, good point, re: # of rules to do coverage. There are problems with rules for sure and I wanted to kinda focus on the positive, often forgotten.

      • Totally agree. A moderate number of rules is far more tractable than a huge number of signatures. And well-written rules can cover a broad swath of interesting behaviors without the need to procure large numbers of positive and negative samples to train an ML model.

  • Saumitra Das says:

    Very interesting discussion.

    My two cents here is that everything can be defined as a set of rules, but how can we define and discover a good “rule set” that can work with high precision and recall? If we frame the problem statement this way, it provides another perspective on when to use what.

    Problem complexity: For any reasonably complex problem, humans certainly won’t be able to do it and there is hard evidence of this in multiple computer science disciplines. Only a machine that can look through enough diverse data will be able to automatically learn statistical patterns that can then be applied to produce highly accurate results. This is even more applicable if the underlying distributions of the data keep evolving..e.g. a million new threats a day.

    Features: Human defined rules can work on a few variables but beyond that reasoning about thresholds becomes non-trivial. AI models can encode “rules” or rather “decision boundaries” involving complex functions on tens of thousands of variables. Traditional machine learning would discover the functions given variables of interest whereas deep learning can discover better functions (given enough diverse high quality data) as well as discover the variables of interest and the best representation of the data. Again one may want to have the variables (features) change as the data evolves not just the functions (rules) around the current variables. This subtle difference helps with “newness” in attacks. When we talk about latching on to characteristics of badness behavior, it matters what we are exposing for it to latch onto and are we allowing it to latch onto whatever makes sense based on the data or are limiting the machine to human engineered variables.

    Alert fatigue: Do we want to have lots of alerts based on “suspicious” behavior from rules or fewer but higher fidelity alerts. While signatures don’t generalize well, they don’t create a sea of alerts. Rules have to be carefully applied to avoid this issue.

    On the flip side, there are cases where rules make a lot of sense. Three cases come to mind – (1) If an attack is very specific .e.g. a specific type of payload sent to a specific service which exploits a vulnerability, then there is no generalizable pattern for AI to learn. (2) If not enough data exists to train a model either on positive or negative side or both (3) If the nature of the problem is not very complex. e.g. If the nature of real world data is that that 2x bytes more than average peers is the primary method of data exfil and doesn’t generate FPs, then the problem is more elegantly solved with a rule rather than forcing AI into it.

  • Gunter Ollmann says:

    The lines are indeed blurry. A few years ago I described ML classifiers and detection capabilities in terms on multidimensional signatures – which both resonated with CISO’s and SecOps teams. The two-part discussion is over on DarkReading – https://www.darkreading.com/attacks-breaches/machine-learning-in-security-good-and-bad-news-about-signatures/a/d-id/1324888

    • Thanks a lot for the comment and the link. Indeed, people who assume that “AI”/ML vs rules is black vs white are bound to be disappointed 🙁

  • Shawn Riley says:

    More advanced rules-based systems are rising on the hype cycle as part of Gartner’s Digitized Ecosystem.

    Emerging technologies require revolutionizing the enabling foundations that provide the volume of data needed, advanced compute power and ubiquity-enabling ecosystems. The shift from compartmentalized technical infrastructure to ecosystem-enabling platforms is laying the foundations for entirely new business models that are forming the bridge between humans and technology.

    This trend is enabled by the following technologies: Blockchain, Blockchain for Data Security, Digital Twin, IoT Platform and Knowledge Graphs.

    “Digitalized ecosystem technologies are making their way to the Hype Cycle fast,” said Walker. “Blockchain and IoT platforms have crossed the peak by now, and we believe that they will reach maturity in the next five to 10 years, with digital twins and knowledge graphs on their heels.”

    This was also observed by Forrester who recently said 10% of enterprises implementing AI applications will add knowledge engineering to the mix—human wisdom and expertise—to “extract and encode inferencing rules and build knowledge graphs from their expert employees and customers.”

    Inferencing rules and knowledge graphs go hand in hand. If you’re not using inferencing rules then you’re probably using a property graph which is a different type of graph supports link analysis instead.

  • Edward Wu says:

    All ML models can be boiled down to a set of decision boundaries and ultimately converted into a logical process, which can be expressed by a turing complete language. So one could argue a ML model is just a collection of nested if statements with complicated conditions. In this case, there is no clear line separating rule and ML in terms of numbers of if statements.

    One way to distinguish rule vs behavioral is how much data does the implementation inspect in order to make the detection. Typically rule based detectors focus on single packet/transaction, and/or a small number of properties and aggregated counts, while behavioral detectors consider a lot more features and datapoints across time (weeks, and months) and space (entities, such as devices, users).

    Another way to separate rule vs behavioral is whether the implementation is self-learning (zero-configuration) and dynamically adapts to a specific network and organization automatically. I think in general customers expect behavioral detectors to be turnkey and adaptive based on observed behaviors and/or user feedback, while rule-based detectors are generally more rigid and require manually tweaking.

    • Thanks a lot for an insightful comment!

      Indeed, “there is no clear line separating rule and ML in terms of numbers of if statements” is accurate but also very annoying to ‘we are ML and NOT rules’ crowd 🙂

      • Edward Wu says:

        at the end of the day, both have its place in a modern tool. it doesn’t make sense to build a DNN that performs regex matching against a known bad user agent, while it’s impossible to manually come up a rule that can infer the role of a entity within an organization and alert on unexpected behavior.

        • Edward Wu says:

          also ML is critical when we get to parts of threat detection where the definition of “threat” varies greatly between different organizations and even network segments.

  • Nichols says:

    Anton, my two cents is that it´s better to have one detection on hand than two detections flying. Security events related to maliciousness can be deterministic (this is really known bad given previous experience) or probabilistic (this can be evil, based on ML, mathematics or some previous pattern that can indicate something is wrong) and both are useful and has needed to threat detection, hunting, whatever. IMHO, when you´re getting more mature, you have better capability to detect more stealthy threats that was passing below the radar, you can calibrate better your sensors and down the ruler to a level that is feasible for your resources (human, storage, network, systems) to treat properly. So, instead of thinking that one method is better than other (as some security guru in this galaxy did), will be more productive to design a threat detection strategy accordingly with your threat model and risks combining than as a Lego to achieve your detection objectives.

    • Thanks for the comment. I’d say that MOST orgs have no idea what “design a threat detection strategy accordingly with your threat model and risks” means in PRACTICAL terms…

      • Nichols says:

        I´m not want to romantize or be a security demagog, but it appears that many security strategists (at least on the job or Linkedin name) did not do his homework. Papers like yours from Gartner (it´s true, I´m a current reader) and other materials/training/readings/conferences and hard job can help to make this practical and the security more a science (not necessarily an exact science, the attacker component ups the game all the time) than just a bunch of guesses and a continuous bet on a horse´s race.