Remember [some] NIDS of the 1990s? Specifically, those that were unable to show the packets that matched the rule triggering the alert! Remember how they were deeply hated by the intrusion detection literati? Security technology that is not transparent and auditable is … what’s the polite term for this? … BAD SHIT!
Today we are – for realz! – on the cusp of seeing some security tools that are based on non-deterministic logic (such as select types of machine learning) and thus are unable to ever explain their decisions to alert or block. Mind you, they cannot explain them not because their designers are sloppy, naïve or unethical, but because the tools are build on the methods and algorithms that inherently unexplainable [well, OK, a note for the data scientist set reading this: the overall logic may be explainable, but each individual decision is not].
For example, if you build a supervised learning system that can look at known benign network traffic and known attack traffic (as training data), then extract the dimensions that it thinks are relevant for making a call on which is which in the future will NEVER be able to fully explain why the decision was made. [and, no, I don’t believe such a system would be practical for a host of others reasons, if you have to ask] Sure, it can show you the connection it flagged as “likely bad”, but it cannot explain WHY it flagged it, apart from some vague point like “it was 73% similar to some other bad traffic seen in the past.” Same with binaries: even if you amass the world’s largest collection of known good and known bad binaries, build a classifier, extract features, train it, etc – the resulting system may not explain why it flagged some future binary as bad [BTW, these examples does not match any particular vendors that I know of, and any matches are purely coincidental!]
My dear security industry peers, are we OK with that? Frankly, I am totally fine with ML-based recommendation engines (what to buy on Amazon? what to watch on Netflix?) – occasionally they are funny, sometimes incorrect, but they are undoubtfully useful. Can the same be said about the non-deterministic security system? Do we want a security guard that shoots people based on random criteria, such as those he “dislikes based on past experiences”, rather than using a white list (let them pass) and a black list (shoot them!)?
One security data scientist reminded me recently that “fast / correct / explainable – pick any two” wisdom applies to statistical models pretty well, and those very models are now creeping into the domain of security. Note that past heuristics and anomaly detection approaches, if complex, are substantially different from this coming wave of non-linear machine logic. You can still do the those old anomaly detection computations “on paper” (however hard the math), and come to the same conclusion as the system – but not with today’s ensemble learning (ha-ha, my candidate model just beat up your champion model!) where the exact decision logic is machine-determined on each occasion, for example.
By the way, my esteemed readers know that all of my work focuses on reality, not marketing pipe dreams and silly media proclamations (remember the idiot who said “Cyber security analytics isn’t particularly challenging from a technical perspective”?). I assure you that this concern is about to become a real concern!
When asked about this issue, designers of security tools that substantially rely on non-deterministic logic offer the following bit of advice: build trust over time by simply using the system. In essence, don’t push the system to any blocking [or “waking people up at 3AM”] mode until you trust it to be correct enough to whatever standard you hold dear. Do you think this is sufficient, in all honesty? Sure, some people will say “yes” – after all, most users of AV tools do not manually inspect all the anti-malware signatures, choosing to trust the vendor. But it is one thing to trust the ultimately-accountable vendor threat research team, and quite another to trust what is essentially a narrow AI.
P.S. We are still hiring – read this and apply!
Blog posts on the security analytics topic:
- SIEM / DLP Add-on Brain
- Those Pesky Users: How To Catch Bad Usage of Good Accounts
- Security Analytics Lessons Learned — and Ignored!
- Security Analytics: Projects vs Boxes (Build vs Buy)?
- Do You Want “Security Analytics” Or Do You Just Hate Your SIEM?
- Security Analytics – Finally Emerging For Real?
- Why No Security Analytics Market?
- SIEM Real-time and Historical Analytics Collide?
- SIEM Analytics Histories and Lessons
- Big Data for Security Realities – Case 4: Big But Narrowly Used Data
- Big Data Analytics Mindset – What Is It?
- Big Data Analytics for Security: Having a Goal + Exploring
- More On Big Data Security Analytics Readiness
- Broadening Big Data Definition Leads to Security Idiotics!
- 9 Reasons Why Building A Big Data Security Analytics Tool Is Like Building a Flying Car
- “Big Analytics” for Security: A Harbinger or An Outlier?