Source IP. Destination IP. Source port. Destination port. Network protocol. Connection time. A bit more context data.
Is this enough to detect “an advanced threat”? Before you jump to conclusions, let’s have a productive discussion here. Some context is required to make it just such a discussion.
Here is where it started:
Detecting *REAL* advanced threats/malware with just network flow data [L3] + fancy algorithms. Believable in principle Y/N?
— Dr. Anton Chuvakin (@anton_chuvakin) October 21, 2015
First, this is NOT a discussion on whether netflow-type data is useful for “cyber” (it is!); it is also NOT a discussion on whether it is the best way to detect said threats (it isn’t, having full network and full endpoint data is). However, it is a discussion on whether having only shallow header data and no payload (no application layer / L7, no HTTP, no DNS details, no raw PCAP, etc) – and NO endpoint data gives us a decent shot – a shot worth taking, essentially. Is netflow a shot worth taking?
Second, what is advanced? We should discuss it, ideally, but I’d rather not. Think Duqu 2.0, Flame, Stuxnet, better Easter European cybercrime wares, and of course the original APTs (example). So, I am setting the bar a bit higher than just “crap that your silly AV misses.”
So, the arguments IN FAVOR:
- Flows can give internal network visibility (think lateral movement, internal recon, staging, etc) that is often impossible to get with logs and hard to get with full traffic capture (need too many capture points)
- Flow information is (?) in fact sufficient in some cases (“what do you mean 123GB of DNS traffic left our network tonight?”)
- Flow information processed by some magical (ML or otherwise) increases otherwise low information density of flow data and can lead to great insights and detections
- If you happen to possess a surprisingly high level of awareness of what is normal on your network (such as on OT, ICS / SCADA, etc networks), flow is all you need
- If you accept a high rate of false positives, an occasional flow-based insight may prove valuable and difficult to obtain by other means (or: only obtainable from more onerous methods such as full traffic capture)
- Flows matched with good threat intel will lead to useful detections (an easy counter: real advanced threats do not show up in TI feeds)
The arguments AGAINST:
- Flows never produce enough certainty to give a credible conviction (bad/ not bad) for any activity; many “this is certainly bad” activities often end up being legit (yes, our IT today is that weird)
- To detect with flows, you need to have a decent idea of what is normal and this is often (always?) impossible on messy real-world networks – and it changes too rapidly
- Not having layer 7 stuff (user-agent, HTTP response code, mail subject FTP username, etc) is a death knell for detecting modern application layer attacks
- A rate of false detection will be too high, no matter what algorithm is used to process the flow streams; you need L7 – a real advanced threat will hide from L3 (=flow) detection among the legit traffic
- Flows are great to validate what you detected by other means (“ah, so they connected to X after their browser exploit worked”), but not as primary/initial means of threat detection
- Lack of user awareness really kills it; real advanced attacks use legit credentials and flows are useless for that.
There you have it – got anything useful to add? Use comments or twitter thread …
P.S. This is a random fun discussion, not aimed at any particular vendor, etc