Gartner Blog Network


Can we Trust “Black Box” Machine Learning when it comes to Security or is there a Better Way?

by Avivah Litan  |  July 27, 2017  |  4 Comments

Machine learning is relatively new to security. It first went mainstream a few years ago in a few security domains such as UEBA, network traffic analytics and endpoint protection. Several vendors earned strong brand recognition by pioneering ML in those spaces. (For examples, see Forecast Snapshot: User and Entity Behavior Analytics, Worldwide, 2017 ; Magic Quadrant for Endpoint Protection Platforms ; Cylance SWOT ).

But when I speak with adopting users and ask if they know what kind of machine learning is under the hood of their acquired software, the typical response I receive is: “not sure – we just know that it works”

Is this response good enough to take machine learning mainstream in security? Will generally skeptical security professionals trust their vendors to ascertain risk? Will they rely on black box ML software to kill processes or kick users off a system?

The resounding and obvious answer to this question for most enterprises is a FLAT NO.

Vendors can’t sell black boxes. Users need to understand what a machine learning model is doing and how they themselves can manage, control and tune the results as needed.

Welcome to Automatic Generation of Intelligent Rules

At least two fraud vendors – DataVisor and ThreatMetrix – have come up with an innovative approach to this black box dilemma for their fraud management clients. That is, their machine learning engines automatically generate rules using attributes provided by their ML models.

In the case of DataVisor, their unsupervised machine learning engine automatically identifies attributes of coordinated attack campaigns and creates new rules daily using their Automated Rules Engine.  DataVisor monitors its automatically generated rules on a continuous basis for relevance or obsolescence (in which case they are retired).

ThreatMetrix’s Smart Learning engine works much the same way albeit based on its supervised (rather than unsupervised) machine learning model which gets its ‘truth’ data from its ”Digital Identity Network”.

This gives these vendors’ customers the ability to manage and tune machine generated rules. This ‘clearbox’ approach – a term coined by ThreatMetrix – takes the mystery out of machine learning. Indeed and in any event, at the end of the day machine learning models generate a set of rules which implement their logic.

Security vendors would be well advised to incorporate such a clearbox approach into their products.  By doing so they would gain much more adoption by justifiably skeptical security managers who need more control over security policies and actions.

Lessons from Google Search

We should all take a lesson from Google who is slowly moving its way into having AI manage its search engine.

Google executives have traditionally been resistant to using machine learning inside their search engine for good reason. It is often difficult to understand why neural nets behave the way they do, so it would be much more difficult for them to manage and refine their search behavior and results.

Nonetheless, as they gain greater insights into AI engines, they will start relying on it more and more. Larry Page of Google recently put it best:

“Artificial intelligence would be the ultimate version of Google. The ultimate search engine that would understand everything on the web. It would understand exactly what you wanted, and it would give you the right thing. We’re nowhere near doing that now. However, we can get incrementally closer to that, and that is basically what we work on.”

Security’s future?

Security works much the same way. It would be ideal for AI to automatically ascertain our security postures and to immediately and continuously correct the errors and vulnerabilities that it finds.

Those days are far off however. In the meantime, all we can do is try to understand what our AI and machine learning engines are doing, and learn to control them rather than the other way around.

Category: 

Avivah Litan
VP Distinguished Analyst
12 years at Gartner
30 years IT industry

Avivah Litan is a Vice President and Distinguished Analyst in Gartner Research. Her area of expertise includes financial fraud, authentication, access management, identity proofing, identity theft, fraud detection and prevention applications…Read Full Bio


Thoughts on Can we Trust “Black Box” Machine Learning when it comes to Security or is there a Better Way?


  1. Since the early days of computers we have used them for problems that we did not know how to answer but where we can test the solution when we see it. In the early days we solved problems by statistics or even brute force that we did not know how to solve analytically. There are other problems where good enough solutions are better than none and some, as in Jeopardy, where we can tolerate false positives.

    Turing taught us that it was easier to eliminate non-keys than to identify THE key. The Bombes produced a lot of false positives, the more errors in the cipher text, the more false positives. They were still useful.

    There are problems like this in security to which we can apply AI. Real time traffic analysts is such a problem.

    The issue is not whether we can trust black-box AI but for what set of problems can we use it.

  2. Bob Huber says:

    Whether you call it blackbox, or clearbox matters not. The question is whether the engines provide enough fidelity to take action, or, enough context to perform more informed investigations. And yes, if you have products running AI/ML or similar techniques, you need to understand how they work so you know your exposure (false positives, negatives, time drains, etc). AL based products send most teams down a rabbit-hole trying to ascertain whether the activity is indeed good or bad. Most organizations don’t have the resources or expertise for this. If you have a large organization with hunt or discovery teams, this can prove valuable to find as yet ‘unknown’ activity.

  3. Avivah Litan says:

    Thanks for these comments William and Bob. You raise some very good points that expand our understanding.

  4. Daniel Bago says:

    Operating as a PAM vendor, we regularly get this question or concern from clients: How can we trust in the results of your behavior analysis module? As it seems a crucial and frequently repeated question nowadays, we invented a solution which can automatically evaluate the effectiveness of unsupervised machine learning algorithms. If you are interested in the concept behind it, you can learn more here:
    https://www.balabit.com/blog/how-to-evaluate-unsupervised-anomaly-detection-for-user-behavior-analytics/
    https://www.balabit.com/blog/performance-metrics-for-anomaly-detection-algorithms-in-security-analytics/
    https://www.balabit.com/blog/novel-performance-metrics-anomaly-detection-algorithms/



Leave a Reply

Your email address will not be published. Required fields are marked *

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.