Blog post

Better Data or Better Algorithms?

By Anton Chuvakin | October 04, 2016 | 1 Comment

securitydeceptionanalyticsData and Analytics Strategies

An eternal question of this big data age is: what to choose, BETTER DATA or BETTER ALGORITHMS?

So far, most [but not all!] of the deception users we interacted with seem to be using their deception tools as “a better IDS.” Hence our discussion of the business case for deception (here and here) was centered on detecting threats.

Naturally, there are many detection tool categories (SIEM, UEBA / UBA, EDR, NTA, and plenty of other yet-unnamed ones) that promise exactly that – better threat detection and/or detection of “better” threats!

During one of the recent “deception calls” it dawned on us what separates “deception as detection” from those other tools:

  • DECEPTION TOOLS rely on “better source data”, such as attacker’s authentication logs, attacker’s traffic, files that the attacker touched, etc
  • MOST OTHER TOOLS rely on “better data analysis” of data such as all logins, all traffic or all files touched, etc.

So, can we say which one is better? Until we can have a cage match of a deception vendor with, say, a UEBA vendor, we probably won’t know for sure. The largest enterprises (the proverbial “security 1%-ers”) will “buy one of each” (as usual) and the smaller ones will wait for a product that combines both featuresets with a firewall 🙂

For example, one of the interviewees outlined an elegant scenario where a deception tool and a UBA / UEBA tool are used together. We hesitate to say that this is the future for everybody, but it was an interesting example of the “strength-based” approach to tools…

Still, “detection by better source data” has unique appeal to people who are just not willing to “explore all data.” Our contacts report “low friction”, better signal/noise, low/no “false positives” and low operational burden for deception tools [used for detection].

Hence, unlike the “all data + smart algorithms” that may be philosophically superior (since looking at ALL data will theoretically alllow you to detect all threats, but … can we really have ALL data?), some organizations are choosing “decoy-sourced data” and seem happy with their decisions…

Our related blog posts on deception:

Comments are closed

1 Comment

  • Andre Gironda says:

    You’re talking about the difference between threat intelligence and friendly intelligence; between capabilities in counter deception and counterintelligence versus run-of-the mill data-science techniques.

    Data science is important because it reduces the size of the haystack while still allowing for a ridiculously-sized fast-rate growing haystack. Yet most of the UEBA, UBA, NTA, EDR, and SIEM platforms utilize either non-complex relationships (N.B., I’ve yet to see a cyber intelligence platform outside of the free, open-source BloodHound platform from the Veris Group Adaptive Threat Division utilize a graph db) or, worse, a subset of machine learning and statistical inference techniques such as confidence intervals.

    When comparing controls, you will likely want to first model cyber risk (i.e., OpenGroup FAIR) correctly. Deception systems are controls in the avoidance or deterrence categoricals, not typical infosec vulnerability or response controls. Thus, they have an entirely different effect on the variables and boundaries that the model subsists when compared to UBA, UEBA, NTA, EDR, and SIEM — which are mostly focused on those also-critical responsive controls. Thus, an org would want to utilize EP curves and risk-tolerance curves (ala How to Measure Anything in Cyber Security Risk) with comparatives inside each control boundary, i.e., for Deception Systems these would be compared against other platforms in the avoidance and deterrence control sets (e.g., anti-fraud systems, anti- cred stuffing, responsive IPS, reduction, segmentation, segregation, anti-tampering, et al) while pitting the responsive controls against each other to determine how to prioritize spend, resource allocation, project-delivery timelines, etc.