“There are two, seemingly conflicting, views on how to formulate a hypothesis for big data analysis: via data exploration or by having a goal. Exploration within the frame of having a goal is an expected work pattern with big data.” (source: “No Data Scientist Is an Island in the Ocean of Big Data”, another excellent GTP piece on big data)
This thinking, as applied to information security, shows a drastic departure from “which reports do I run?” and “how do I tweak my correlation rules?” thinking that dominates the land of SIEM, the most analytics-heavy security product category today.
- Are you ready to explore your data?
- Do you have clear goals for delving into the data?
If the answer is “no, I want to be told what matters for me” and “no, I just want it all in Hadoop”, sadly, big data approaches are likely not for you. Well, you can try it, but prepare to be sorely disappointed after spending a lot of money and time.
Let’s tackle these one by one:
- We touched on the subject of security data exploration when talking about NFT and ETDR tools (see “Use Cases for Network Forensics Tools”, “Endpoint Visibility Tool Use Cases” and “Alert-driven vs Exploration-driven Security Analysis”). Indeed, organizations are starting to explore their log stores, packet stores and endpoint traces stores in order to discover malware and other indicators of attackers’ activity. Exploring unstructured big data piles, however, is much harder and may involve text analytics, hardcore statistical methods and other esoteric disciplines, much removed from the traditional security skill sets (it is not all about the keyword search, you know).
- Regarding the goals, the same research reminds us that “analysis designing consists of formulating viable business [security, in this case] and analytical hypotheses” and then iterating using the available data. Are you collecting just in case? Because “Hadoop is cheap”? Start thinking clear goals and then testing them on data. Can I find out who touches my sensitive applications maliciously? Is there any way to mine my web logs to find early recon? Do I have traces of phishing “backscatter” and how do I find them?
And of course, “having a goal and opportunistic exploration are not mutually exclusive. Exploration eventually leads to formulating concrete hypotheses through multiple iterations of honing a goal.” (same document)
Finally, “there is a common illusion that hiring a data scientist solves all big data needs.” Agreed, that is an illusion! However, keep the opposite in mind as well: NOT hiring a data scientist probably solves NONE of the big data needs (after all, you miss 100% of the shots you don’t take….)
Related posts on the topic of big data for security:
- More On Big Data Security Analytics Readiness
- Broadening Big Data Definition Leads to Security Idiotics!
- Next Research Project: From Big Data Analytics to … Patching
- 9 Reasons Why Building A Big Data Security Analytics Tool Is Like Building a Flying Car
- “Big Analytics” for Security: A Harbinger or An Outlier?
- All posts tagged big data