One of the key uses for threat intelligence (TI) data is making better threat intelligence data out of it. Some people go fancy and call it “threat intel fusion” and I like the term, maybe because it has not been hijacked by the marketers yet.
So, threat intelligence fusion. I define this as simply a process of making better intelligence out of existing intelligence by enriching, linking, validating, contextualizing and otherwise growing the depth or breadth of available threat intelligence data sets.
For example, if you had a report that such and such panda may want your stuff and also a separate list of “bad” IP addresses, relating them to each other (such as that said panda eats shoots and leaves with your stuff in a .zip file in the direction of an IP address 10.10.10.10) creates better intelligence and [hopefully] enables you to better triage the events you observe and incidents you handle in your environment.
How do we build a generic map of a TI fusion process that people can actually follow? “Tricky problem here you have” – Yoda might say 🙂 On the other hand, a doge may say “oh wow. much threat data. so delicious.” But I digress….
So, a given: pile of threat data, IPs, hashes, domains, other indicators, but also reports on threat actors and their capabilities, interests and infrastructure. Many feeds arrive daily, threat reports flood the inboxes, rumors are whispered and threat portals are alight with news…
A goal: useful and relevant TI. Or at least more useful and more relevant TI.
To achieve our goal, we may apply operations similar to these:
- Link: store TI data in a way (and no, you don’t have to use Hadoop) to enable searching all the data at once such as for an IP address; such naive linking (show me all records from all data sources with the same IP) is useful and delivers insights above and beyond those of individual feeds stored independently
- Enrich: for each external IP get Whois data, geo, file reputation, etc. such tactical TI data enrichment enables organizations to do linking and relating better and also provides a way to validate weak TI signals
- Relate: link IP to domain to URL to malware sample (that connect to and comes from an IP) in order to discover new threat activities and expand the scope of your response process; many other relating operations are possible
- Validate: match TI to known local black- and white-lists enables to promote/discard some pieces of intelligence
- Contextualize: relate the data to local observations, events, incidents, triage activities in order to make TI data more relevant (for example, anything linked to the data that is linked to a threat actor seen on your network will be more relevant)
As a result, the company may follow be process similar to this (so far, many variations have been observed so treat this as a sample and not as a “best practice”; note that this sample is NOT taken from any particular organization but is “inspired by a real story”):
- Collect TI data from public, community and commercial sources; or from wherever else you can get some TI
- Store data in a way that takes care of duplicate data and links it together, as well as links it to pre-existing data; retain counts of how many times each entity appears across sources (it goes without saying that any and all deduplication has to be 100% lossless)
- Enrich newly loaded data with required external context, such as DNS, Whois, hashes to public blacklists like VirusTotal, etc
- Relate enriched TI data such as IP to domain to URL, domain to registrant, IP to ISP and geographical area, IP to campaign, file hash to malware family, etc to enable future insights and analysis; relate across TI domains such as malware data to phishing to intrusion data (ideally, relating to strategic TI such as adversary interests and goals may happen here)
- Validate and contextualize the newly loaded data with other historical records such as logs and flows (typically batched or upon major data change) as well as old incident records and existing “proven” TI data sets
- Update adversary profiles (if maintained) based on newly loaded, enriched and related TI data
- If desired, adjust confidence and priority ratings for each entity (for example, a new bad IP from an ISP with a long history of hosting malware may be up-voted in priority; an entity that shows up on many lists may be up-voted in confidence)
- Convert TI data into tool specific formats based on the entry type (malware IPs into NIDS signatures, other bad IPs into SIEM watchlists, email subjects into DLP rules, file hashes into ETDR or even EPP/AV tool rules, etc)
- Distribute TI based on the type to the right tool for future real time (or batched) matching (while doing so, you may want to clean some old TI-derived rules)
Note that the relating step may not necessarily make TI more actionable, but it sure makes it more useful for triaging and investigating events and incidents, essentially it makes it into more useful context.
Finally, some of you may ask, but “what about the internally source intel?” Well…wait until the next post to find out!
Posts related to this research project: